Date: Sat, 27 Nov 2021 13:57:57 -0500 From: Chet Ramey <chet.ra...@case.edu> Message-ID: <5217c48e-c989-a163-5673-38995e35a...@case.edu>
Warning: long message follows, give yourself time to digest it. | OK, if you do end up building the devel branch, I'd be interested | in these results. Assuming that happens, I shall certainly let you know. | > Once, of course ... why would I ever build it again? | | Patches exist. There are vendors who take the original release, apply their | own special-sauce patches, then apply the patches I release as they come | out, as part of their own distribution release process. Of course, NetBSD pkgsrc (used on other systems as well) does that too. But your patches appear about every 5-6 months, so I end up doing one build every 5-6 months. Keeping the object files (even the unpacked sources) sitting around waiting for the next patches, in order to save perhaps 2-3 minutes of build time isn't worth the bother. Once built and installed it all gets trashed. [I have also contemplated doing builds in an MFS (or tmpfs) which would vanish on a reboot (or just umount) and I do tend to reboot more often than bash patches are released ... but I've yet to actually do that, for bash, the build time saved wouldn't be worth the bother - for some other apps, it might be]. pkgsrc doesn't encourage attempting to retain anything in any case - it probably isn't a problem for bash (at least I've never see it, not that I ever looked either) but other applications have a habit of deleting files from their distributions - and unless one starts from an empty directory, unpacking a tarball doesn't cause those files to be removed ... further, some build systems don't pay attention to what is supposed to be there, and manage to link all the .o files they can find. It is easier, and more reliable, to simply start clean every time. But of course that doesn't apply when you're developing and building several times a day (or sometimes, dozens of times an hour). That just doesn't apply to me with bash. | Usually, that's ok. In this instance, where we're discussing a feature | whose implementation is substantially different between the released and | development versions, it's more relevant. Sure, though I didn't know this part was changed so much in the devel version until you told me just recently (I do not watch what happens there). | So the ultimate question is whether or not the act of reading a command | substitution should reset this requirement. That's where we disagree. | The grammar is, at that point, reading a different command. "command" is a loaded word in sh terminology, it is used for all kinds of things, but in general it is not at all unusual for here document text to appear while a command other than the one with the redirection operator is being processed (no command substitutions necessarily involved). What the grammar is doing after a here doc redirection operator has been processed, until the next newline (token) is encountered is irrelevant - the spec imposes no requirements upon that at all. | > Then we get to whether heredoc data is part of a valid shell script | > in that sense - when there is yet to be a newline token to introduce it. | | What does this mean? In all cases, the here-documents are not read until | after a newline token. That's not the issue. Sure, but that's not what I meant. I treat heredoc data as much the same as a \newline - something that the lexer deals with, and the grammar never knows happened. Heredoc data doesn't appear at all in the sh grammar, as nothing in the grammar cares in the slightest about them (once they're queued). What I meant was that from that perspective, whether a sh script (or sh script fragment) is valid or not, is determined by the grammar, and given that here doc data does not appear there, it cannot have any impact upon the decision whether some particular part of the sh input is valid or not. Of course, if the script ends (completely) without a newline token after the last redirect operator then that's an error - but of a subtly different kind (more like an unterminated string (mismatched quotes) or here doc data without its required terminating word -- all lexical constructs). So, if one does $( cmd <<END ) there's nothing invalid about that, unless EOF follows that ')' before a newline token appears. And if that happens, it isn't the grammar that complains, but something beyond that. The syntax "word redirect" is perfectly valid, and "<< word" is a perfectly valid redirect. The data doesn't need to appear there, if no newline has yet appeared, any more than it does in cmd << EOF ; ... where the data doesn't need to appear there, when a newline has not yet appeared. You seem to be hung up on the way you have chosen to implement $( ) (which of itself is OK, but it is not required to be done that way) where (it seems) you parse the command inside the $() as if there was no world at all outside it. As far as getting the grammar correct that's fine, but it doesn't work with here doc data. | > | The netbsd shell appears to be the outlier here. The parser reads the | > | command substitution so it can parse the entire and-or list before trying | > | to gather any here-documents. | > | > You cannot possibly really mean that I hope. That is, in | > | > cmd1 <<EOF && | > data | > EOF | > cmd2 | > | > you do agree that "data" is stdin to cmd1, that is, the herdoc data | > appears splat in the middle of the and-or list. That's certainly the | > way it appears to work (in bash) to me. | | There is no command substitution in this example. I know. But go back and read the quote from you (still here, above, in this message) again: "The parser reads the command substitution so it can parse the entire and-or list before trying to gather any here-documents" ** parse the entire and-or list before trying to gather any here documents ** I don't believe that you really meant that, it isn't the way bash behaves (unless this is something different in the devel version, but I doubt that) and I was just pointing out that poor phraseology. | So, again, the question is whether or not input data that is logically | part of the command substitution (it appears between the opening and | closing parentheses) should affect the `outer' command. That's the | question. We have different answers. We do, because I don't view here doc data as affecting anything except the command for which it is input. As far as the script goes, it is just a rather weird method (kind of like the original implementation) of creating an anonymous file and then passing that file as input (usually stdin, but not required to be) to a command. Consider this alternative, which is (one possibility for) what would be needed if here-docs did not exist: printf '%s\n' 'data' >/tmp/hidden.data.$$ cmd </tmp/hidden.data.$$ rm /tmp/hidden.data.$$ whereas with here-docs, we do instead cmd <<'END' data END That's all fine, and either of those would (more or less) work with any shell. Now consider instead that cmd is to be run in a command substitution. One can certainly do ... $( printf "%s\n" 'data' >/tmp/hidden.data.$$ cmd </tmp/hidden.data.$$ rm /tmp/hidden.data.$$ ) ... which is the rough equivalent of ... $( cmd <<END data END ) ... and that should work. No question. But one can also do printf "%s\n" 'data' >/tmp/hidden.data.$$ .... $( cmd </tmp/hidden.data.$$ ) ... rm rm /tmp/hidden.data.$$ and that would also work everywhere, right? That is, the data for the command in the command substitution is created (and removed, but that bit of it is generally irrelevant here) outside the command substitution. This is the rough equivalent of ... $( cmd << \END ) ... data END And then once you allow that to work (which you're apparently now doing in the devel version), there cannot really be any objection to cmd <<END $( cmd1 && data END cmd2 ) as that's really just the same principle being applied in the other direction. Furthermore that means that in cmd <<END1 $( cmd1 <<END2 && (with a newline after the "&&") the data that follows is data1 END1 data2 END2 keeping the left to right across the input line is the order that the standard requires here document data to appear in. Here "input line" is really a logical line, rather than a physical one. as we have already agreed that here docs don't appear in the middle of quoted strings, and nor do they appear after elided newlines (\newline pairs) which are removed, neither of which generates a newline token. But it is "line" not "command", or anything else related to the grammar which is specified: The redirection operators "<<" and "<<-" both allow redirection of subsequent lines "subsequent lines" ie: "lines after the current line" If more than one "<<" or "<<-" operator is specified on a line, the here-document associated with the first operator shall be supplied first by the application and shall be read first by the shell. Note: "line", not grammatical command, or script, or and-or list, or anything related to the grammar at all. (The grammar generally ignores lines, a newline token is almost just a ';' - except we're allowed as many newlines as we like, where just one ';' (sometimes none) is permitted). Another example (no cmdsubs again) that is kind of weird, and unlikely, but should be permitted, and should work: cat << END; case $PATH data END in *:/bin:*) echo /bin is in PATH! ;; esac Bash (5.1.xx) allows that, so does everything else (aside from some old, and not even all that old, ash derived shells which had a bug not relevant here). The heredoc data for cat appears splat in the middle of the unrelated case statement. No problems, it all works, as it should - but probably would not if here-doc data was something known to the grammar. But it isn't, the lexer removes it, as far as the grammar & its parser are concerned the "data" and "END" lines are not there at all. kre