Re: Unclosed quotes on heredoc mode

Robert Elz Sun, 28 Nov 2021 11:30:57 -0800

    Date:        Sat, 27 Nov 2021 13:57:57 -0500
    From:        Chet Ramey <chet.ra...@case.edu>
    Message-ID:  <5217c48e-c989-a163-5673-38995e35a...@case.edu>


Warning: long message follows, give yourself time to digest it.

  | OK, if you do end up building the devel branch, I'd be interested
  | in these results.

Assuming that happens, I shall certainly let you know.

  | > Once, of course ... why would I ever build it again?
  |
  | Patches exist. There are vendors who take the original release, apply their
  | own special-sauce patches, then apply the patches I release as they come
  | out, as part of their own distribution release process.

Of course, NetBSD pkgsrc (used on other systems as well) does that too.
But your patches appear about every 5-6 months, so I end up doing one
build every 5-6 months.   Keeping the object files (even the unpacked
sources) sitting around waiting for the next patches, in order to save
perhaps 2-3 minutes of build time isn't worth the bother.   Once built
and installed it all gets trashed.
        [I have also contemplated doing builds in an MFS (or tmpfs)
        which would vanish on a reboot (or just umount) and I do tend
        to reboot more often than bash patches are released ... but I've
        yet to actually do that, for bash, the build time saved         wouldn't
        be worth the bother - for some other apps, it might be].

pkgsrc doesn't encourage attempting to retain anything in any case - it
probably isn't a problem for bash (at least I've never see it, not that
I ever looked either) but other applications have a habit of deleting files
from their distributions - and unless one starts from an empty directory,
unpacking a tarball doesn't cause those files to be removed ... further,
some build systems don't pay attention to what is supposed to be there,
and manage to link all the .o files they can find.

It is easier, and more reliable, to simply start clean every time.

But of course that doesn't apply when you're developing and building
several times a day (or sometimes, dozens of times an hour).   That just
doesn't apply to me with bash.

  | Usually, that's ok. In this instance, where we're discussing a feature
  | whose implementation is substantially different between the released and
  | development versions, it's more relevant.

Sure, though I didn't know this part was changed so much in the
devel version until you told me just recently (I do not watch what happens
there).

  | So the ultimate question is whether or not the act of reading a command
  | substitution should reset this requirement. That's where we disagree.
  | The grammar is, at that point, reading a different command.

"command" is a loaded word in sh terminology, it is used for all kinds of
things, but in general it is not at all unusual for here document text to
appear while a command other than the one with the redirection operator is
being processed (no command substitutions necessarily involved).   What the
grammar is doing after a here doc redirection operator has been processed,
until the next newline (token) is encountered is irrelevant - the spec
imposes no requirements upon that at all.


  | > Then we get to whether heredoc data is part of a valid shell script
  | > in that sense - when there is yet to be a newline token to introduce it.
  |
  | What does this mean? In all cases, the here-documents are not read until
  | after a newline token. That's not the issue.

Sure, but that's not what I meant.   I treat heredoc data as much the same
as a \newline - something that the lexer deals with, and the grammar never
knows happened.   Heredoc data doesn't appear at all in the sh grammar,
as nothing in the grammar cares in the slightest about them (once they're
queued).  What I meant was that from that perspective, whether a sh script
(or sh script fragment) is valid or not, is determined by the grammar, and
given that here doc data does not appear there, it cannot have any impact
upon the decision whether some particular part of the sh input is valid or
not.   Of course, if the script ends (completely) without a newline token
after the last redirect operator then that's an error - but of a subtly
different kind (more like an unterminated string (mismatched quotes) or
here doc data without its required terminating word -- all lexical constructs).

So, if one does

        $( cmd <<END )

there's nothing invalid about that, unless EOF follows that ')' before
a newline token appears.   And if that happens, it isn't the grammar that
complains, but something beyond that.   The syntax "word redirect" is
perfectly valid, and "<< word" is a perfectly valid redirect.   The data
doesn't need to appear there, if no newline has yet appeared, any more
than it does in

        cmd << EOF ; ...

where the data doesn't need to appear there, when a newline has not yet
appeared.

You seem to be hung up on the way you have chosen to implement $( )
(which of itself is OK, but it is not required to be done that way)
where (it seems) you parse the command inside the $() as if there was no
world at all outside it.   As far as getting the grammar correct that's
fine, but it doesn't work with here doc data.


  | >    | The netbsd shell appears to be the outlier here. The parser reads the
  | >    | command substitution so it can parse the entire and-or list before 
trying
  | >    | to gather any here-documents.
  | > 
  | > You cannot possibly really mean that I hope.   That is, in
  | > 
  | >   cmd1 <<EOF &&
  | >   data
  | >   EOF
  | >           cmd2
  | > 
  | > you do agree that "data" is stdin to cmd1, that is, the herdoc data
  | > appears splat in the middle of the and-or list.   That's certainly the
  | > way it appears to work (in bash) to me.
  |
  | There is no command substitution in this example.

I know.   But go back and read the quote from you (still here, above, in
this message) again: "The parser reads the command substitution so it can
parse the entire and-or list before trying to gather any here-documents"

** parse the entire and-or list before trying to gather any here documents **

I don't believe that you really meant that, it isn't the way bash behaves
(unless this is something different in the devel version, but I doubt that)
and I was just pointing out that poor phraseology.

  | So, again, the question is whether or not input data that is logically
  | part of the command substitution (it appears between the opening and
  | closing parentheses) should affect the `outer' command. That's the
  | question. We have different answers.

We do, because I don't view here doc data as affecting anything except the
command for which it is input.   As far as the script goes, it is just a
rather weird method (kind of like the original implementation) of creating
an anonymous file and then passing that file as input (usually stdin, but
not required to be) to a command.

Consider this alternative, which is (one possibility for) what would be
needed if here-docs did not exist:

        printf '%s\n' 'data' >/tmp/hidden.data.$$
        cmd </tmp/hidden.data.$$
        rm /tmp/hidden.data.$$

whereas with here-docs, we do instead

        cmd <<'END'
        data
        END

That's all fine, and either of those would (more or less) work
with any shell.

Now consider instead that cmd is to be run in a command substitution.

One can certainly do

        ... $(
                printf "%s\n" 'data' >/tmp/hidden.data.$$
                cmd </tmp/hidden.data.$$
                rm /tmp/hidden.data.$$
        ) ...

which is the rough equivalent of

        ... $( cmd <<END
        data
        END
        ) ...

and that should work.  No question.

But one can also do

        printf "%s\n" 'data' >/tmp/hidden.data.$$
        .... $( cmd </tmp/hidden.data.$$ ) ...
        rm rm /tmp/hidden.data.$$

and that would also work everywhere, right?   That is, the data for the
command in the command substitution is created (and removed, but that bit
of it is generally irrelevant here) outside the command substitution.

This is the rough equivalent of

        ... $( cmd << \END ) ...
        data
        END

And then once you allow that to work (which you're apparently now doing
in the devel version), there cannot really be any objection to

        cmd <<END $( cmd1 &&
        data
        END
                        cmd2 )

as that's really just the same principle being applied in the other
direction.   Furthermore that means that in

        cmd <<END1 $( cmd1 <<END2 &&

(with a newline after the "&&") the data that follows is

        data1
        END1
        data2
        END2

keeping the left to right across the input line is the order
that the standard requires here document data to appear in.

Here "input line" is really a logical line, rather than a physical
one. as we have already agreed that here docs don't appear in the
middle of quoted strings, and nor do they appear after elided newlines
(\newline pairs) which are removed, neither of which generates a newline
token.   But it is "line" not "command", or anything else related to the
grammar which is specified:

        The redirection operators "<<" and "<<-" both allow redirection
        of subsequent lines

"subsequent lines" ie: "lines after the current line"

        If more than one "<<" or "<<-" operator is specified on a line,
        the here-document associated with the first operator shall be
        supplied first by the application and shall be read first by the
        shell.

Note: "line", not grammatical command, or script, or and-or list, or
anything related to the grammar at all.   (The grammar generally ignores
lines, a newline token is almost just a ';' - except we're allowed as
many newlines as we like, where just one ';' (sometimes none) is permitted).

Another example (no cmdsubs again) that is kind of weird, and unlikely,
but should be permitted, and should work:

cat << END; case $PATH 
data
END
in
        *:/bin:*) echo /bin is in PATH! ;;
esac

Bash (5.1.xx) allows that, so does everything else (aside from some old,
and not even all that old, ash derived shells which had a bug not relevant
here).   The heredoc data for cat appears splat in the middle of the
unrelated case statement.   No problems, it all works, as it should - but
probably would not if here-doc data was something known to the grammar.
But it isn't, the lexer removes it, as far as the grammar & its parser are
concerned the "data" and "END" lines are not there at all.

kre

Re: Unclosed quotes on heredoc mode

Reply via email to