Date: Tue, 8 Jan 2019 16:51:04 +0000 From: Geoff Clare <g...@opengroup.org> Message-ID: <20190108165104.GA31969@lt2.masqnet>
| Given Chet's reply, it looks like there may be more shells that do expand | than don't. In which case I wonder why that "unquoted" text got added | in 2016. I don't know the history of aliases in sh (Joerg?) it may be that they originally appeared back when any quoted character "looked different" internally to the same character, unquoted, and that the test was just ch == ' ' and not (ch & ~SQUOTE) == ' ' so only unquoted spaces worked.. And then that behaviour was retained in derived shells, even after the quoting encoding method was altered, and that those shells were the ones mostly considered when the text was written. Note, this is 100% guessing. I have gone back and looked at the NetBSD sh, and I still see nothing which actuially implements the "no quoted space" rule, yet testing demonstrates that it does work -- but I also see that we treat "blank" in the definition as meaning [[:blank:]] and not the literal ' ' I thought it perhaps should be (that part of the code I took from FreeBSD), though the (old) text is specific that it is <blank> which XBD 3.74 is quite clear is [[:blank:]],. I wonder whether this might be yet another area where shells differ, or does everyone allow a tab (as well as space) as meaning "look for a following alias" ?? | This surprised me. I was previously unaware that the first word in | the alias value is subject to recursive alias expansion. There is | nothing in the standard to suggest this happens! There certainly isn't in the current (published) text, which is part of what is wrong with it (but really that's just a part..) The problem there is just that is very sloppily written - if you actually know how aliases are processed (and have been since csh) then that wording can be more or less perverted enough to cover it ... the: the word shall be replaced by the value of the alias; just needs to be understood to mean "in the input stream", such that the process of getting a token (to replace the one which was the alias name) involves scanning the alias value, and generating tokens from it. When that starts, we're still in the exact same (gramattical) state as we were when be entered and found the alias word, so if that one was in the "command word position of a simple command" that is where the first word token from the expansion of the alias must occur - hence it gets looked up as an alias again. Treated literally the quoted words above would mean that if we have alias foo=bar and the input is foo 1 2 3 then the "foo" being in a command word position, and also a defined alias would simply be replaced by the word "bar" and we'd be done. But that would mean that in the example in the alias page in XCU 4, where alias lf='ls -CF' if the input is lf . the replacement test would be, effectively "ls -cF" . (the quotes would not be there, but the ls -CF part would be a single 6 character word) and that would be the command word of the generated command. Since it is clear from the text accompanying the examples in XCU4(alias) that that is not what happens, then something has to use that embedded space as a token delimiter, and the only way that happens, is if the process of "replaced by the value of the alias" is read as if it included "as if it appeared that way in the original input stream, and tokenisation restarts with the first character of the alias value". The proposed new wording from 953 does not have this problem, as it is clear that the alias value is subject to tokenisation, and when that happens, the first token (which because of the restrictions we're placing on the value of the alias) must in a conformant script be a word, will still be in the "command name position' (we are yet to return anything to the grammar which could change that) and so is "obviously" subject to alias lookup. | I think that instead of talking about "direct" and "indirect" it might | be better to say that it applies through multiple levels of recursion. Like I said before, any better wording that achieves the same effect is fine - I am (notoriously) not good at writing text, which is why I generally do not often attempt to supply candidate text for bug fixes, but just ideas for what the eventual text should include. Never treat anything I ever do suggest as being any more than that, and always look for a better way to phrase anything I have written! | If this is because IO_NUMBER is not expanded, this change would make no | difference to the behaviour required by the standard (because we're saying | the behaviour is unspecified if the alias value contains a redirection). No, the IO_NUMBER of concern is not in the alias value (string), it is in the original text. It would be (I think) a surprise to everyone if we were to define things so that aliases are only expanded in simple commands that contain no redirects, which in the "lf" example, would mean that lf >/tmp/listing could not work. It would also mean that the lexer would be required to look a LONG way ahead in the input stream before replacing the alias, as that lf command might be lf a b c 'some other llong name with an emberred newline' $( any random command substitution) \ >/tmp/something In the example in question here, the original text is 3>&1 command and we have "alias 3=4" "3" is a valid alias name (alias names are not required to start with an alpha) and when the tokeniser is run there, we are starting in the state where we are at the command word position (you have to assume this here from the context, but take it as a n axiom for this example). There the first token produced by the lexer is the IO_NUMBER "3", which is a word according to XBD 3.446, and thus, according to the (current and proposed) spec for alias processing, should be subject to alias replacement. But as Joerg made clear, this is not how it works (in any shell I am aware of), it is only the tokens that are TOKEN which are subject to alias expansion or keyword lookup (the keyword lookup part is never really relevant, as there are no keywords that are also candidates for being IO_NUMBERs so whether the lookup as a keyword fails because the word simply isn't a keyword, or because we never attempt the lookup on IO_NUMBERs is irrelevant. It isn't for aliases however. kre