Date: Tue, 1 Feb 2022 15:39:06 -0500 From: Chet Ramey <chet.ra...@case.edu> Message-ID: <2816cf78-d7be-b9e1-733d-12427b04c...@case.edu>
| When you say "just parsed," when are aliases expanded? During lexical analysis, right between when the input is read (if it is read, and isn't from some internal string) and when it is handed to the grammar. | Are they expanded while scanning the command substitution to find the | closing `)' but not part of the text that results? For the former half, yes. And I think you're aware that we have no "text that results" - which is why, among other things, <<$(anything) doesn't work for us, as the $(anything) is a (kind of) a pointer to a piece of parse tree, which never ends up matching anything even remotely ascii. Keeping the original text is something on my todo list, but it is complicated by our memory management methods - I could just malloc(biggish) and then realloc(2 * biggish) if biggish isn't enough, but I really would prefer not to do that if I can avoid it (this is a temporary string, it doesn't need semi-permanent storage, which is what malloc() is used for in our shell). | What form are the `results' kept in (I | assume a parse tree similar to a shell function)? Very similar - shell functions start out identical, but are then converted (slightly) into a more condensed form, as they hang around for a long time, unlike command substitutions which last just as long as it takes to prepare the current command. | Do they include expanded aliases or is that deferred? Aliases (as evil as they are) need to be expanded to be able to generate the correct parse tree - they cannot be deferred. Consider the different tree you'd get for cmd1 arg1; cmd2 arg2; cmd3 if that was parsed exactly as written, compared to what it would look like if we had alias cmd1=if alias cmd2=then alias cmd3=fi | That seems like the crux of the issue. If the command substitution is part | of a shell function definition, you only want to expand aliases the `first | time' -- at the time you parse the shell function. Yes, aliases are always lexical. The function is a tree from the parse, there's nothing in the tree, ever, that has anything whatever to do with aliases (aside from the "alias" and "unalias" commands themselves of course, but those are just ordinary commands, and treated that way). | You can execute arbitrary commands, including alias definitions, | between the time the shell function is defined and the time it's executed. Yes. Too late for anything in the function (anything at all). | POSIX requires that aliases in command substitutions be expanded when | the function definition is parsed, not when the command substitution | is finally executed, Yes, that's what we do - it is a consequence of the "parse everything when it is first seen, and never again" philosophy. | but bash has not traditionally done it that way. That's where the backwards | compatibility issues come in. Oh. I see. Does anyone really care though? It is kind of hard to imagine anyone being perverse enough to use an alias in a function anywhere, let alone depend upon it changing between executions of a function when the alias definition has been altered. Anyone doing anything like that deserves to have their code break IMO. | It's not true that `no commands can be executed'. The alias can be altered | by commands between a shell function definition and its execution. Yes, I had forgotten that case. | I considered keeping both the original text and the parse tree from the | parsed command substitution (well, a chain of them since you can have an | arbitrary number of command substitutions in a word). Yes, that's what we do. | It's difficult, given bash's internal structure, to preserve the | original text Same for us, probably for different reasons though. | -- as opposed to the reconstituted text -- How accurately can you reconstitute? That is, can you maintain the difference between $(a b) and $( a b ) for example ? How about $(a b) ? | Other shells (bosh, mksh) also recreate the text of a command | substitution from the parsed commands. Interesting, though I am not surprised in a way. Actually needing the command sub in textual form (aside from perhaps showing in output from jobs or something, though I doubt it is ever needed even there - and for that kind of thing, a good approximation is just fine in any case) is very very rare in sh - the end word of a here-doc redirection operator might be the only case. | | > It seems to me that the way you're doing it now would also break: | > | > alias x=y | > cat <<$(x) | > whatever | > $(x) | This works in bash default mode because aliases aren't expanded while the | command is parsed. I don't know whether I was in "default mode" or not (all I did was run "bash-5.2" which is the name where I stored the development binary - which is still the last one I built while running tests, I haven't built the 5.2alpha version yet, so what I am running is about a month and a half old I think) But given that I simply ran it, and then typed (pasted into an xterm actually, but that makes no difference) the commands above, and it didn't work (it did in bash 5.1.16 or whatever the current released one is). | The delimiter ends up being `$(x)'. In the version I tested (5.2 development version) it ended up being $(y) | Since you're required to check the line read for the terminating | delimiter before doing anything else, the delimiter has to be $(x) | to make it work. Yes, I'm aware of that. | It works with ksh93 as well, but every other shell produces an error of | some sort (including the NetBSD sh, unless you've changed something in the | couple of months since I last built it). No, it definitely does not work in our shell. That's what I said a little later in my message. When we process the redirect, the word has (more or less in this case) just a pointer to a parse tree (or as you mentioned above, to the head of a chain of parse trees, in this case, a chain of length 1). | I can't see it working in any shell's posix mode if posix requires aliases | to be expanded while reading the WORD containing the command substitution | that is the here-doc delimiter. I have no idea, I doubt that until this issue came up, anyone has ever considered this. Aliases are a total botch, and should be eliminated from the standard completely, then implementors who need to keep them can make them work however they like (like arrays). | If that's the case, you have an alias | expansion mismatch, since I don't believe you're permitted to perform | alias expansion on the lines of the here-document as you read them, You are correct. aliases are only ever expanded in the command word position, or immediately after the expansion of a preceding alias whose definition ended in a space (one of the most bizarre syntax rules I have ever seen, anywhere). here doc text is never a command word, it is only ever input for some file descriptor, so is never eligible for alias expansion. Nor is anything that is quoted, and here doc text is always quoted (double when the redirect operator end-word is not quoted, or single if it is). | and the resolution to bug 1036 makes it clear -- to me, at least -- that you | check for the delimiter before doing anything to the line. Yes. Though (while not at all relevant right now, in this discussion) the processing of \newline in double-quoted heredocs (unquoted word on redirect operator) is a bit murky. Of course in reality, none of this truly matters. Only an idiot would ever actually do something like alias x=y cat <<$(x) whatever $(x) which I guess is why it was me who suggested that... For that matter, only an idiot would actually ever use (what looks like, but really isn't) a command substitution in or as a here doc end delimiter word - or, for that matter, even include any $ in one of those (except perhaps as \$ or inside single quotes, causing the here doc to be of the single quoted form). None of this really matters to anyone except those of us who dream up torture tests, or try and nail down every word in the standard. kre ps: I haven't seen my (or your) message via bug-bash yet, though mind did get to (was accepted by) whatever is the MX for gnu.org several hours ago now, is the list perhaps not working (I did recently see one message has arrived from it (twice) though - but that one seems to have been held up at the list for about 4 hours, neither mine nor yours are that old yet). Received: from localhost ([::1]:34998 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from <bug-bash-bounces+kre=munnari.oz...@gnu.org>) id 1nF1Ld-0002jI-DI for k...@munnari.oz.au; Tue, 01 Feb 2022 17:10:01 -0500 Received: from eggs.gnu.org ([209.51.188.92]:45358) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from <fxmb...@gmail.com>) id 1nExqR-0000tL-OZ for bug-bash@gnu.org; Tue, 01 Feb 2022 13:25:38 -0500 Received: from [2607:f8b0:4864:20::42a] (port=40901 helo=mail-pf1-x42a.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from <fxmb...@gmail.com>) id 1nExqK-0001HX-NF for bug-bash@gnu.org; Tue, 01 Feb 2022 13:25:30 -0500