On 12/5/18 7:24 PM, Ingo Schwarze wrote:
> Hi,
>
> putting the minimal useful example in the place of longer quotations:
>
> $ printf "A\nB\n" | gsed '1b;='
> A
> 2
> B
> $ printf "A\nB\n" | sed '1b;='
> sed: 1: "1b;=": undefined label ''
>
> Martijn van Duren wrote on Wed, Dec 05, 2018 at 09:24:05AM +0100:
>
>> Note that the label should consist of "portable filename
>> character set" characters, so adding the semicolon support doesn't break
>> compatibility too bad. Although it is a violation, not an extension on
>> unspecified behaviour (only unspecified behaviour is for is for
>> s/../../w).
>
> Why do you think it is a violation?
Because POSIX goes out of its way to make it not obvious:
Editing commands other than {...}, a, b, c, i, r, t, w, :, and # can be
followed by a <semicolon>, optional <blank> characters, and another
editing command. However, when an s editing command is used with the w
flag, following it with another command in this manner produces
undefined results.
They begin by a negation which can use a semicolon and then they follow
by explicitly stating where undefined behaviour lies. So assuming that
not including in "can" equals "may still", and assuming that the
undefined results section is a non-exhaustive list, or an exclusive for
the inverse group mentioned at the star, may result in undefined
behaviour. But combine the obscure language with the fact that there's a
profound reason to not use a semicolon in 6 out of the 10 exclude group
makes me wonder if it's not a violation why they went out of their way
to place them in the same list as a, c, i, r, w, #.
>
> POSIX requires that "editing commands other than {...}, a, b, c, i, r, t,
> w, :, and # can be followed by a <semicolon>...", but i don't see
> anything that specifies what should happen if b is followed by a
> semicolon. So that case seems unspecified. Our manual page is
> somewhat fuzzy with respect to semicolons, but let's fix that after
> deciding the behaviour.
>
> Let's look at the likely reasons *why* POSIX does not require
> semicolons to work after these commands:
>
> * a, c, i:
> These take text arguments, which can usefully contain semicolons.
>
> * #:
> Comments can usefully contain semicolons, too.
>
> * r, w:
> These take filename arguments, which might contain semicolons;
> it doesn't matter in this context that filenames containing
> semicolons are asking for trouble in other respects.
>
> * {...}:
> I see no reason for not allowing semicolons here, and indeed:
>
> $ printf "A\nB\n" | sed -n '{=;};p'
> 1
> A
> 2
> B
>
> Same for GNU sed.
>
> That leaves us with b, t, : - which take label arguments.
> Support for labels containing semicolons is explicitly
> not required and would hardly be useful.
>
> We already support semicolon-continuation after b, t, :,
> and so does GNU sed:
>
> $ printf "A\nB\n" | sed -n '1bL;=;:L;p'
> A
> 2
> B
>
> $ printf "A\nB\n" | sed -n 's/A/C/;tL;=;:L;p'
> C
> 2
> B
>
> Removing the existing support for semicolons after b, t, :
> does not seem to me to make anything better, but might break
> existing scripts or stuff in ports.
>
> So it would seem best to me to only fix the case of b and t without
> argument followed by a semicolon, to match GNU, which does the
> logical and consistent thing here.
>
> $ printf "A\nB\n" | gsed '1b;='
> A
> 2
> B
> $ printf "A\nB\n" | gsed 's/A/C/;t;='
> C
> 2
> B
I agree with your reasoning and I don't mind making it consistent, see
my previous mail about using option 3; but if you ask me if this in
line with what POSIX states, I'd argue it's not and if it is it's only
by the skin of its teeth.
>
> I didn't study your patch yet - is that what it does?
That's what patch option 3 does:
$ printf "A\nB\n" | ./sed '1b;='
A
2
B
>
> Yours,
> Ingo
>