Date:        Tue, 8 Jan 2019 16:51:04 +0000
    From:        Geoff Clare <g...@opengroup.org>
    Message-ID:  <20190108165104.GA31969@lt2.masqnet>


  | Given Chet's reply, it looks like there may be more shells that do expand
  | than don't.  In which case I wonder why that "unquoted" text got added
  | in 2016.

I don't know  the history of aliases in sh (Joerg?) it may be that they 
originally appeared back when any quoted character "looked different"
internally to the same character, unquoted, and that the test was just
        ch == ' '
and not
        (ch & ~SQUOTE)  == ' ' 

so only unquoted spaces worked..   And then that behaviour was retained
in derived shells, even after the quoting encoding method was altered,
and that those shells were the ones mostly considered when the text was
written.

Note, this is 100% guessing.

I have gone back and looked at the NetBSD sh, and I still see nothing
which actuially implements the "no quoted space" rule, yet testing
demonstrates that it does work -- but I also see that we treat "blank"
in the definition as meaning [[:blank:]] and not the literal ' ' I thought
it  perhaps should be (that part of the code I took from FreeBSD),
though the (old) text is specific that it is <blank> which XBD 3.74 is
quite clear is [[:blank:]],.    I wonder whether this might be yet another
area where shells differ, or does everyone allow a tab (as well as
space) as meaning "look for a following alias" ??


  | This surprised me.  I was previously unaware that the first word in
  | the alias value is subject to recursive alias expansion.  There is
  | nothing in the standard to suggest this happens!

There certainly isn't in the current (published) text, which is part of
what is wrong with it (but really that's just a part..)  The problem there
is just that is very sloppily written - if you actually know how aliases are
processed (and have been since csh) then that wording can be more
or less perverted enough to cover it ... the:

        the word shall be replaced by the value of the  alias;

just needs to be understood to mean "in the input stream", such that
the process of getting a token (to replace the one which was the alias
name) involves scanning the alias value, and generating tokens from
it.  When that starts, we're still in the exact same (gramattical) state as
we were when be entered and found the alias word, so if that one was
in the "command word position of a simple command" that is where the
first word token from the expansion of the alias must occur - hence it
gets looked up as an alias again.

Treated literally the quoted words above would mean that if we have

        alias foo=bar

and the input is

        foo 1 2 3

then the "foo" being in a command word position, and also a defined alias
would simply be replaced by the word "bar" and we'd be done.

But that would mean that in the example in the alias page in XCU 4,
where
        alias lf='ls -CF'
if the input is

        lf .

the replacement test would be, effectively

        "ls -cF" .

(the quotes would not be there, but the ls -CF
part would be a single 6 character word) and that
would be the command word of the generated command.

Since it is clear from the text accompanying the examples
in XCU4(alias) that that is not what happens, then something
has to use that  embedded space as a token delimiter, and the
only way that happens, is if the process of "replaced by the
value of the alias" is read as if it included "as if it appeared that
way in the original input stream, and tokenisation restarts
with the first character of the alias value".

The proposed new wording from 953 does not have this problem,
as it is clear that the alias value is subject to tokenisation, and
when that happens, the first token (which because of the restrictions
we're placing on the value of the alias) must in a conformant script
be a word, will still be in the "command name position' (we are
yet to return anything to the grammar which could change that) and
so is "obviously" subject to alias lookup.


  | I think that instead of talking about "direct" and "indirect" it might
  | be better to say that it applies through multiple levels of recursion.

Like I said before, any better wording that achieves the same effect
is fine - I am (notoriously) not good at writing text, which is why I
generally do not often attempt to supply candidate text for bug fixes,
but just ideas for what the eventual text should include.   Never treat
anything I ever do suggest as being any more than that, and always
look for a better way to phrase anything I have written!


  | If this is because IO_NUMBER is not expanded, this change would make no
  | difference to the behaviour required by the standard (because we're saying
  | the behaviour is unspecified if the alias value contains a redirection).

No, the IO_NUMBER of concern is not in the alias value (string), it is in the
original text.   It would be (I think) a surprise to everyone if we were to 
define things so that aliases are only expanded in simple commands that
contain no redirects, which in the "lf" example, would mean that

        lf >/tmp/listing

could not work.   It would also mean that the lexer would be required
to look a LONG way ahead in the input stream before replacing the
alias, as that lf command might be

        lf a b c 'some other llong
name with an emberred newline' $(
any random command substitution) \
                >/tmp/something


In the example in question here, the original text is

        3>&1 command

and we have "alias 3=4"

"3" is a valid alias name (alias names are not required to start
with an alpha) and when the tokeniser is run there, we are starting
in the state where we are at the command word position (you have
to assume this here from the context, but take it as a n axiom
for this example).   There the first token produced by the lexer
is the IO_NUMBER "3", which is a word according to XBD 3.446,
and thus, according to the (current and proposed) spec for alias
processing, should be subject to alias replacement.

But as Joerg made clear, this is not how it works (in any shell I
am aware of), it is only the tokens that are TOKEN which are
subject to alias expansion or keyword lookup (the keyword lookup
part is never really relevant, as there are no keywords that are
also candidates for being IO_NUMBERs so whether the lookup
as a keyword fails because the word simply isn't a keyword, or
because we never attempt the lookup on IO_NUMBERs is irrelevant.
It isn't for aliases however.

kre

Reply via email to