Date: Tue, 7 Sep 2021 10:29:18 +0100
From: "Geoff Clare via austin-group-l at The Open Group"
<[email protected]>
Message-ID: <20210907092918.GA29665@localhost>
| But, as I mentioned above, if the shell looks for the delimiter before
| it removes <backslash><newline> instead of after, it will get this
| case wrong:
|
| $ cat <<EOF
| > foo\
| > EOF
| > EOF
| fooEOF
| $
Sure: From the NetBSD sh (actually this is a moderately old one now,
but this hasn't changed)
sh $ cat <<EOF
foo\
EOF
E\
OF
EOF
fooEOF
EOF
sh $
yash does the same:
yash $ cat <<EOF
> foo\
> EOF
> E\
> OF
> EOF
fooEOF
EOF
yash $
| The foo\ and the first EOF line are required to be parsed as part of the
| here-doc. The second EOF line is then the delimiter.
Yes, but nothing in that says that parsing the end delimiter needs to
join lines together with backslash newlines before looking for it.
What the standard actually says is (and for current purposes we will
ignore the issue about which <newline> is next):
The here-document shall be treated as a single word that begins
after the next <newline> and continues until there is a line
containing only the delimiter and a <newline>, with no <blank> [...]
"continues until there is a line containing only the delimiter and a newline"
Nothing about a "reconstructed line" or "joined line" or anything else related.
E\
OF
is not "a line containing" anything at all, it is two lines, until after
they have been joined. According to what is written there, there must
actually be a line containing only...
The point is that the standard is (perhaps) not clear about this, it can be,
and obviously has been, read both ways (or perhaps the implementations that
did it the way you believe the standard requires, which I disagree with,
simply copied ksh and weren't concerned with what the standard said at all).
That is rather than "Getting that right means that <backslash><newline> within
the delimiter would naturally also be handled correctly.", at least in the
NetBSD sh, there is a bunch of code to explicitly not do that, and to
do what it looks to me as if the standard explicitly requires.
What's more, I believe it is a better result, as it allows the here doc
to end up containing a line which is otherwise identical to the here doc
delimiter line. Aside from the E\\\nOF variant, in our shell, any of these
also prevent the EOF from being the delimiter
\
EOF
EOF\
that is, if the line before the (apparent) delimiter contains a line
continuation, even if that line is otherwise empty, the line that looks
like the delimiter, isn't. Similarly if the line that looks like the
delimiter is continued, even if the continuation line is empty, then it
is not the delimiter.
I would suggest that (especially as this doesn't really matter to anything)
the standard should make it explicitly unspecified what happens if there are
any line continuations immediately before, after, or in the middle of a
string which (were the lines input in the joined state implied) be the
end delimiter (ie: users desiring portable scripts must not rely upon any
specific behaviour here).
| > | The one that's in effect when the <<-EOF redirection is evaluated,
| > | i.e. /some/file. Again, all the shells I tried do it that way.
| >
| > In that situation the FreeBSD shell writes to whatever fd 3 was before
| > the command...
[code and results omitted]
| And this must be another case where dash changed to match other shells.
Not necessarily, the NetBSD sh is the same as dash (and most others) here,
ash derived shells have had various issues with getting heredocs right, and
solved those in different ways (we (NetBSD) still don't get the exit code
of the following (complete) command correct...
<<EOF
$(exit 1)
EOF
but no-one is really suffering because of that, so I haven't spent much
time looking for a fix that doesn't break something else.
And last
| When the POSIX shell requirements were originally developed for POSIX.2-1992
| the intention was to describe the behaviour of ksh88 with a few deliberate
| changes (as documented in the rationale, where nothing is said about a change
| in this area).
| Therefore the intention was for POSIX to require the ksh88 behaviour here, so
| I believe that's the "right" behaviour even if the standard is not entirely
| clear it is required.
That might all be correct, but none of it matters now. All that is relevant
is what the words in the standard, as it exists now, actually say. What
someone intended them to say, 30 years ago, is irrelevant. We now have
implementations that are either based entirely upon what the standard says,
or have been modified to conform. They cannot be invalidated because
someone now believes (rightly or wrongly) that the wording in the standard
isn't what was actually intended to be written, decades ago.
kre