Re: sed(1) not branching to the end of the script

Martijn van Duren Wed, 05 Dec 2018 22:08:33 -0800

On 12/5/18 7:24 PM, Ingo Schwarze wrote:
> Hi,
> 
> putting the minimal useful example in the place of longer quotations:
> 
>    $ printf "A\nB\n" | gsed '1b;='
>   A
>   2
>   B
>    $ printf "A\nB\n" | sed '1b;='  
>   sed: 1: "1b;=": undefined label ''
> 
> Martijn van Duren wrote on Wed, Dec 05, 2018 at 09:24:05AM +0100:
> 
>> Note that the label should consist of "portable filename
>> character set" characters, so adding the semicolon support doesn't break
>> compatibility too bad. Although it is a violation, not an extension on
>> unspecified behaviour (only unspecified behaviour is for is for
>> s/../../w).
> 
> Why do you think it is a violation?


Because POSIX goes out of its way to make it not obvious:
Editing commands other than {...}, a, b, c, i, r, t, w, :, and # can be 
followed by a <semicolon>, optional <blank> characters, and another 
editing command. However, when an s editing command is used with the w  
flag, following it with another command in this manner produces 
undefined results.

They begin by a negation which can use a semicolon and then they follow
by explicitly stating where undefined behaviour lies. So assuming that
not including in "can" equals "may still", and assuming that the
undefined results section is a non-exhaustive list, or an exclusive for
the inverse group mentioned at the star, may result in undefined
behaviour. But combine the obscure language with the fact that there's a
profound reason to not use a semicolon in 6 out of the 10 exclude group
makes me wonder if it's not a violation why they went out of their way
to place them in the same list as a, c, i, r, w, #.

> 
> POSIX requires that "editing commands other than {...}, a, b, c, i, r, t,
> w, :, and # can be followed by a <semicolon>...", but i don't see
> anything that specifies what should happen if b is followed by a
> semicolon.  So that case seems unspecified.  Our manual page is
> somewhat fuzzy with respect to semicolons, but let's fix that after
> deciding the behaviour.
> 
> Let's look at the likely reasons *why* POSIX does not require
> semicolons to work after these commands:
> 
>  * a, c, i:
>    These take text arguments, which can usefully contain semicolons.
> 
>  * #:
>    Comments can usefully contain semicolons, too.
> 
>  * r, w:
>    These take filename arguments, which might contain semicolons;
>    it doesn't matter in this context that filenames containing
>    semicolons are asking for trouble in other respects.
> 
>  * {...}:
>    I see no reason for not allowing semicolons here, and indeed:
> 
>       $ printf "A\nB\n" | sed -n '{=;};p'
>      1
>      A
>      2
>      B
> 
>    Same for GNU sed.
> 
> That leaves us with b, t, : - which take label arguments.
> Support for labels containing semicolons is explicitly
> not required and would hardly be useful.
> 
> We already support semicolon-continuation after b, t, :,
> and so does GNU sed:
> 
>    $ printf "A\nB\n" | sed -n '1bL;=;:L;p'
>   A
>   2
>   B
> 
>    $ printf "A\nB\n" | sed -n 's/A/C/;tL;=;:L;p'
>   C
>   2
>   B
> 
> Removing the existing support for semicolons after b, t, :
> does not seem to me to make anything better, but might break
> existing scripts or stuff in ports.
> 
> So it would seem best to me to only fix the case of b and t without
> argument followed by a semicolon, to match GNU, which does the
> logical and consistent thing here.
> 
>    $ printf "A\nB\n" | gsed '1b;='                
>   A
>   2
>   B
>    $ printf "A\nB\n" | gsed 's/A/C/;t;='   
>   C
>   2
>   B

I agree with your reasoning and I don't mind making it consistent, see
my previous mail about using option 3; but if you ask me if this in
line with what POSIX states, I'd argue it's not and if it is it's only
by the skin of its teeth.
> 
> I didn't study your patch yet - is that what it does?

That's what patch option 3 does:
$ printf "A\nB\n" | ./sed '1b;='
A
2
B
> 
> Yours,
>   Ingo
>

Re: sed(1) not branching to the end of the script

Reply via email to