In article <[email protected]>, Robert Elz <[email protected]> wrote: >POSIX is planning to add readlink(1) in the next version. Nothing >special to say about that (makes no real difference to us, we have it >already, they will specify only the common options.) > >But while doing that, they looked at the -f option, and saw in coreutils >that their man page says to use realpath(1) instead of readlink -f > >(They never even got as far as detecting that our readlink -f and the >coreutils readlink -f don't act the same). > >So, it was asked whether other systems have realpath(1) - we do, kamil@ >added it back in Feb 2020, with the comment: > > Port realpath(1) from FreeBSD > > realpath(1) wraps realpath(3) and returns resolved physical path. > > This utility shipped with GNU and FreeBSD is sometimes > used in scripts in the wild. > >It is currently in HEAD only - it will be in 10 when that gets released. > >So, POSIX has more or less decided to skip the -f option of readlink, >and require realpath(1) instead (realpath(3) has been around in POSIX for ages, >but is an XSI option ... realpath(1) won't be, just mandatory (probably)). > >However, FreeBSD's realpath(1) (now also ours) and the coreutils realpath(1) >are substantially different beasts - the FreeBSD version is (as kamil said) >just a wrapper around realpath(3) and is quite simple. > >coreutils realpath is a monstrous mess. Fortunately, POSIX aren't >proposing standardising almost any of that, just the basic functionality >which replaces readlink -f. > >Unfortunately, for POSIX (and us) basic realpath (as in "realpath file") >has the same basic operational difference as readlink -f has between the >BSD & GNU implementations. Ours is literally: "call realpath(3), if it >returns something, print that, otherwise it is an error". Theirs allows >the final component in the expanded and canonicalized path to not exist. >(Their doc does not say what "not exist" really means in the hard cases, >but from testing their implementation, it is clear that if namei() returns >ENOENT for the final component, that is an allowed case, any other error >return is not). > >The people who use this demand that functionality remain (I'm still unclear >on why - if the file is not to be created, who cares what its canonical path >would be, if it is, create it first using the known name, and canonicalize >later should work I would have thought ... but they don't agree - they say, >that if we want to know if it exists, we can canonicalise first, then test -e >though for a long time I wasn't sure how that was a rational counter argument, >I'm still not). > >For a while I thought we could just do (in C, not exactly this) if >realpath($FILE) fails: > echo $(realpath $(dirname $FILE))/$(basename $FILE) >(with appropriate tests for when $FILE has no '/' etc), but that doesn't >work - it is not just the last component of the $FILE arg which is allowed not >to exist (though that case is part of it) but where that component exists, >and is a symlink, and the last component of that doesn't exist, or exists >and is another symlink for which ... this can go on (almost) forever. > >The current POSIX proposal is to specify "realpath -e" (which is a coreutils >arg which makes theirs act just like ours) and also invent a new -E >arg, which would make ours work like theirs. It would be unspecified >which was the default - ie: all scripts would need to use one of those >options to be portable. The allowed result when neither option is given is >made even more bizarre to cater for a built in realpath in mksh, which >is even wackier in its default (and only) behaviour (inexplicable in some >cases) than the coreutils version - but the mksh one takes exactly 1 arg, >the path name, and simply execs realpath from the filesystem if anything >different is passed to it, so "realpath -[Ee] file" will bypass that >implementation and run a real one instead. > >I have added -E support to our realpath(1) (that is, to the .c, haven't >gotten around to the man page yet) and of course -e (which is more or >less a no-op). For now, I have made the default be -E if neither option >is given, which returns the same result as we currently get in cases >we do not currently produce an error, and makes our implementation more >compatible with (the small part that is sane) of the coreutils implementation. > >I am not proposing adding any of their myriad other useless options, with >the sole possible exception of -z (which causes their realpath to use \0 >rather than \n between output paths, and makes it a little safer in the >possible presence of paths containing newline chars when more than one >path arg is given ... the POSIX version (currently) will only specify >realpath working with a required single file arg .. our version (the FreeBSD >version), defaults to "." if no file is given, coreutils don't do that, >and both versions process as many file args as are given). > >The source file size about doubles with these changes, which means about >3 times as much actual code (since about half of the current source is the >boilerplate noise). > >Any objections to adding this (man page would come with the commit of >course, so will some ATF tests - I will convert my current test script) ? > >Any opinions on whether the default (no -e or -E used) should be as >ours is now, or as coreutils is? (My slight preference is to follow >coreutils here, it is more compatible). > >POSIX's hope is that if we do this, FreeBSD will take the code back, and >the other BSD variants might follow, and the end result might be (mksh aside) >a reasonably consistent world.
Thanks for doing all this work. I agree we should default to what coreutils is doing and default to -E to make the world more homogeneous. Best, christos
