Re: sed(1) not branching to the end of the script

Martijn van Duren Wed, 05 Dec 2018 04:24:02 -0800

On 12/5/18 11:56 AM, Andreas Kusalananda Kähäri wrote:
> On Wed, Dec 05, 2018 at 09:24:05AM +0100, Martijn van Duren wrote:
>> On 12/5/18 8:23 AM, Andreas Kusalananda Kähäri wrote:
>>> On Wed, Dec 05, 2018 at 08:09:30AM +0100, Andreas Kusalananda Kähäri wrote:
>>>> On Wed, Dec 05, 2018 at 06:14:34AM +0200, Lars Noodén wrote:
>>>>> I'm noticing some trouble with branching in sed(1) now.  Leaving the
>>>>> label empty should branch to the end of the script:
>>>>>
>>>>>       [2addr]b [label]
>>>>>              Branch to the : function with the specified label.  If the 
>>>>> label
>>>>>              is not specified, branch to the end of the script.
>>>>>
>>>>> However, in practice, when I try branching without a label, I get an
>>>>> error about an undefined label instead of it branching to the end of
>>>>> the script:
>>>>>
>>>>> $ echo -e "START\nfoo\nbar\nEND\nbaz\n" | sed -n '/^START/,/^END/b;p;'
>>>>> sed: 1: "/^START/,/^END/b;p;": undefined label ''
>>>>>
>>>>> Adding a label works as expected:
>>>>>
>>>>> $ echo -e "START\nfoo\nbar\nEND\nbaz\n" | sed -n '/^START/,/^END/ba;p;:a;
>>>>
>>>> No, adding the newlines makes it work.  The label has nothing to do with
>>>> it.
>>>
>>> Sorry, too early in the morning to be reading code and make a difference
>>> between code and data, inserting a label seems to make it work (but
>>> I'm unsure why; sure, it's convenient, but why do we have it?)  Still,
>>> a portable sed script should have a newline after the "b" (and ":")
>>> commands, not a semicolon.
>>
>> It seems you're right, although apparently gnu does support the
>> semicolon. Note that the label should consist of "portable filename
>> character set" characters, so adding the semicolon support doesn't break
>> compatibility too bad. Although it is a violation, not an extension on
>> unspecified behaviour (only unspecified behaviour is for is for
>> s/../../w).
>>
>> So our options are:
>> 1) Be extremely pedantic and check everything is within POSIX spec.
> 
> This would get my vote as well.


Just to be clear; this wouldn't get my vote per se, I would just prefer
this. The risk of breaking too many things is too real, so it's not a
safe option.

I added it here mostly for illustration purposes.
> 
>> 2) Remove support for semicolon and be more in line with POSIX. This
>>    way we get the semicolon as a label-character for free and removes
>>    the most LoC.
> 
> Although the standard, as far as I can see right now, says nothing
> about what characters are supposed to be valid in a label (only that an
> implementation should support labels of length 8, at least), it feels
> really weird to have semicolon as a label character...

POSIX states[0]:
If a label argument (to a b, t, or : command) contains characters 
outside of the portable filename character set, or if a label is longer 
than 8 bytes, the behavior is unspecified.

So supported characters are[1]:
alnum, dot, underscore, hyphen

So the semicolon is not disallowed, because it's behaviour unspecified.
> 
>> 3) Keep violating POSIX, but make the behaviour consistent, similar to
>>    what gnu sed does.
> 
> I'm generally opposed to extra conveniences like these, especially if
> they aren't actually needed for any other reason than to cater for
> people who grew up on GNU systems.  However, a consistent behaviour
> would definitely be good; either "b;" *and* "blabel;" (and ":label;")
> works (the way GNU does it), or they generate some diagnostic (as with
> Option 1).

The problem is, there's also ports to consider, who sometimes use base
sed in their build infrastructure; Changing sed to option 1 could
potentially break quite a few packages, which then would have to be
fixed by the ports maintainers and hope that upstream accepts the
patches. So sometimes you have to follow the idiosyncrasies of the bigger
party.
> 
> The merit of the bug report is that it points out that the current
> behaviour appears to be inconsistent.

I agree.
> 
> Regards,
> Andreas
> 
>>
>> Personally I would prefer option 1, since that would help write portable
>> scripts, but probably breaks a lot. Option 2 will probably also break a
>> few things, so I bet people will vote for option 3.
>>
>> martijn@
>>>
>>>>
>>>> The label (empty or not) has to be delimited by a newline.  In your
>>>> first script, you could also have used
>>>>
>>>>     sed -n -e '/^START/,/^END/b' -e p
>>>>
>>>> (each -e inserts a newline in the script), or more simply
>>>>
>>>>     sed '/^START/,/^END/d'
>>>>
>>>> This is AFAIK standard behaviour.
>>>>
>>>> >From POSIX:
>>>>
>>>>     Command verbs other than {, a, b, c, i, r, t, w, :, and # can be
>>>>     followed by a <semicolon>, optional <blank> characters, and
>>>>     another command verb.
>>>>
>>>>
>>>> Andreas
>>>>
>>>>>
>>>>> If I have not made a mistake with the short script above then there
>>>>> seems to be a discrepancy between the behavior described in the manual
>>>>> and the actual behavior.
>>>>>
>>>>> dmesg below
>>>>> /Lars
>>>>>
>>> [cut]
>>>
[0] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/sed.html
[1] 
https://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html#tag_03_276

Re: sed(1) not branching to the end of the script

Reply via email to