Re: readlink(1) realpath(1) and POSIX

Christos Zoulas Mon, 18 Jul 2022 12:44:22 -0700

In article <[email protected]>,
Robert Elz  <[email protected]> wrote:
>POSIX is planning to add readlink(1) in the next version.   Nothing
>special to say about that (makes no real difference to us, we have it
>already, they will specify only the common options.)
>
>But while doing that, they looked at the -f option, and saw in coreutils
>that their man page says to use realpath(1) instead of readlink -f
>
>(They never even got as far as detecting that our readlink -f and the
>coreutils readlink -f don't act the same).
>
>So, it was asked whether other systems have realpath(1) - we do, kamil@
>added it back in Feb 2020, with the comment:
>
>   Port realpath(1) from FreeBSD
>
>   realpath(1) wraps realpath(3) and returns resolved physical path.
>
>   This utility shipped with GNU and FreeBSD is sometimes
>   used in scripts in the wild.
>
>It is currently in HEAD only - it will be in 10 when that gets released.
>
>So, POSIX has more or less decided to skip the -f option of readlink,
>and require realpath(1) instead (realpath(3) has been around in POSIX for ages,
>but is an XSI option ... realpath(1) won't be, just mandatory (probably)).
>
>However, FreeBSD's realpath(1) (now also ours) and the coreutils realpath(1)
>are substantially different beasts - the FreeBSD version is (as kamil said)
>just a wrapper around realpath(3) and is quite simple.
>
>coreutils realpath is a monstrous mess.    Fortunately, POSIX aren't
>proposing standardising almost any of that, just the basic functionality
>which replaces readlink -f.
>
>Unfortunately, for POSIX (and us) basic realpath (as in "realpath file")
>has the same basic operational difference as readlink -f has between the
>BSD & GNU implementations.   Ours is literally: "call realpath(3), if it
>returns something, print that, otherwise it is an error".   Theirs allows
>the final component in the expanded and canonicalized path to not exist.
>(Their doc does not say what "not exist" really means in the hard cases,
>but from testing their implementation, it is clear that if namei() returns
>ENOENT for the final component, that is an allowed case, any other error
>return is not).
>
>The people who use this demand that functionality remain (I'm still unclear
>on why - if the file is not to be created, who cares what its canonical path
>would be, if it is, create it first using the known name, and canonicalize
>later should work I would have thought ... but they don't agree - they say,
>that if we want to know if it exists, we can canonicalise first, then test -e
>though for a long time I wasn't sure how that was a rational counter argument,
>I'm still not).
>
>For a while I thought we could just do (in C, not exactly this) if
>realpath($FILE) fails:
>       echo $(realpath $(dirname $FILE))/$(basename $FILE)
>(with appropriate tests for when $FILE has no '/' etc), but that doesn't
>work - it is not just the last component of the $FILE arg which is allowed not
>to exist (though that case is part of it) but where that component exists,
>and is a symlink, and the last component of that doesn't exist, or exists
>and is another symlink for which ... this can go on (almost) forever.
>
>The current POSIX proposal is to specify "realpath -e" (which is a coreutils
>arg which makes theirs act just like ours) and also invent a new -E
>arg, which would make ours work like theirs.   It would be unspecified
>which was the default - ie: all scripts would need to use one of those
>options to be portable.   The allowed result when neither option is given is
>made even more bizarre to cater for a built in realpath in mksh, which
>is even wackier in its default (and only) behaviour (inexplicable in some
>cases) than the coreutils version - but the mksh one takes exactly 1 arg,
>the path name, and simply execs realpath from the filesystem if anything
>different is passed to it, so "realpath -[Ee] file" will bypass that
>implementation and run a real one instead.
>
>I have added -E support to our realpath(1) (that is, to the .c, haven't
>gotten around to the man page yet) and of course -e (which is more or
>less a no-op).   For now, I have made the default be -E if neither option
>is given, which returns the same result as we currently get in cases
>we do not currently produce an error, and makes our implementation more
>compatible with (the small part that is sane) of the coreutils implementation.
>
>I am not proposing adding any of their myriad other useless options, with
>the sole possible exception of -z (which causes their realpath to use \0
>rather than \n between output paths, and makes it a little safer in the
>possible presence of paths containing newline chars when more than one
>path arg is given ... the POSIX version (currently) will only specify
>realpath working with a required single file arg .. our version (the FreeBSD
>version), defaults to "." if no file is given, coreutils don't do that,
>and both versions process as many file args as are given).
>
>The source file size about doubles with these changes, which means about
>3 times as much actual code (since about half of the current source is the
>boilerplate noise).
>
>Any objections to adding this (man page would come with the commit of
>course, so will some ATF tests - I will convert my current test script) ?
>
>Any opinions on whether the default (no -e or -E used) should be as
>ours is now, or as coreutils is?   (My slight preference is to follow
>coreutils here, it is more compatible).
>
>POSIX's hope is that if we do this, FreeBSD will take the code back, and
>the other BSD variants might follow, and the end result might be (mksh aside)
>a reasonably consistent world.


Thanks for doing all this work. I agree we should default to what coreutils
is doing and default to -E to make the world more homogeneous.

Best,

christos

Re: readlink(1) realpath(1) and POSIX

Reply via email to