On 12/5/18 8:23 AM, Andreas Kusalananda Kähäri wrote:
> On Wed, Dec 05, 2018 at 08:09:30AM +0100, Andreas Kusalananda Kähäri wrote:
>> On Wed, Dec 05, 2018 at 06:14:34AM +0200, Lars Noodén wrote:
>>> I'm noticing some trouble with branching in sed(1) now. Leaving the
>>> label empty should branch to the end of the script:
>>>
>>> [2addr]b [label]
>>> Branch to the : function with the specified label. If the
>>> label
>>> is not specified, branch to the end of the script.
>>>
>>> However, in practice, when I try branching without a label, I get an
>>> error about an undefined label instead of it branching to the end of
>>> the script:
>>>
>>> $ echo -e "START\nfoo\nbar\nEND\nbaz\n" | sed -n '/^START/,/^END/b;p;'
>>> sed: 1: "/^START/,/^END/b;p;": undefined label ''
>>>
>>> Adding a label works as expected:
>>>
>>> $ echo -e "START\nfoo\nbar\nEND\nbaz\n" | sed -n '/^START/,/^END/ba;p;:a;
>>
>> No, adding the newlines makes it work. The label has nothing to do with
>> it.
>
> Sorry, too early in the morning to be reading code and make a difference
> between code and data, inserting a label seems to make it work (but
> I'm unsure why; sure, it's convenient, but why do we have it?) Still,
> a portable sed script should have a newline after the "b" (and ":")
> commands, not a semicolon.
It seems you're right, although apparently gnu does support the
semicolon. Note that the label should consist of "portable filename
character set" characters, so adding the semicolon support doesn't break
compatibility too bad. Although it is a violation, not an extension on
unspecified behaviour (only unspecified behaviour is for is for
s/../../w).
So our options are:
1) Be extremely pedantic and check everything is within POSIX spec.
2) Remove support for semicolon and be more in line with POSIX. This
way we get the semicolon as a label-character for free and removes
the most LoC.
3) Keep violating POSIX, but make the behaviour consistent, similar to
what gnu sed does.
Personally I would prefer option 1, since that would help write portable
scripts, but probably breaks a lot. Option 2 will probably also break a
few things, so I bet people will vote for option 3.
martijn@
>
>>
>> The label (empty or not) has to be delimited by a newline. In your
>> first script, you could also have used
>>
>> sed -n -e '/^START/,/^END/b' -e p
>>
>> (each -e inserts a newline in the script), or more simply
>>
>> sed '/^START/,/^END/d'
>>
>> This is AFAIK standard behaviour.
>>
>> >From POSIX:
>>
>> Command verbs other than {, a, b, c, i, r, t, w, :, and # can be
>> followed by a <semicolon>, optional <blank> characters, and
>> another command verb.
>>
>>
>> Andreas
>>
>>>
>>> If I have not made a mistake with the short script above then there
>>> seems to be a discrepancy between the behavior described in the manual
>>> and the actual behavior.
>>>
>>> dmesg below
>>> /Lars
>>>
> [cut]
>
Option 1:
Index: compile.c
===================================================================
RCS file: /cvs/src/usr.bin/sed/compile.c,v
retrieving revision 1.49
diff -u -p -r1.49 compile.c
--- compile.c 14 Aug 2018 18:10:09 -0000 1.49
+++ compile.c 5 Dec 2018 08:10:35 -0000
@@ -73,6 +73,7 @@ static struct s_command
*findlabel(char *);
static void fixuplabel(struct s_command *, struct s_command *);
static void uselabel(void);
+static void validlabel(const char *);
/*
* Command specification. This is used to drive the command parser.
@@ -287,23 +288,17 @@ nonsel: /* Now parse the command */
if (*p == '\0')
cmd->t = NULL;
else
- cmd->t = duptoeol(p, "branch", &p);
- if (*p == ';') {
- p++;
- goto semicolon;
- }
+ cmd->t = duptoeol(p, "branch", NULL);
+ validlabel(cmd->t);
break;
case LABEL: /* : */
p++;
EATSPACE();
- cmd->t = duptoeol(p, "label", &p);
+ cmd->t = duptoeol(p, "label", NULL);
+ validlabel(cmd->t);
if (strlen(cmd->t) == 0)
error(COMPILE, "empty label");
enterlabel(cmd);
- if (*p == ';') {
- p++;
- goto semicolon;
- }
break;
case SUBST: /* s */
p++;
@@ -873,4 +868,18 @@ uselabel(void)
free(lh);
}
}
+}
+
+static void
+validlabel(const char *name)
+{
+ size_t i, len;
+ if ((len = strlen(name)) > 8)
+ error(COMPILE, "label too long");
+ for (i = 0; i < len; i++) {
+ if (!isalnum(name[i]) && name[i] != '.' && name[i] != '_'
+ && name[i] != '-')
+ error(COMPILE, "Invalid label character");
+ }
+
}
Option 2:
Index: compile.c
===================================================================
RCS file: /cvs/src/usr.bin/sed/compile.c,v
retrieving revision 1.49
diff -u -p -r1.49 compile.c
--- compile.c 14 Aug 2018 18:10:09 -0000 1.49
+++ compile.c 5 Dec 2018 07:56:59 -0000
@@ -287,23 +287,15 @@ nonsel: /* Now parse the command */
if (*p == '\0')
cmd->t = NULL;
else
- cmd->t = duptoeol(p, "branch", &p);
- if (*p == ';') {
- p++;
- goto semicolon;
- }
+ cmd->t = duptoeol(p, "branch", NULL);
break;
case LABEL: /* : */
p++;
EATSPACE();
- cmd->t = duptoeol(p, "label", &p);
+ cmd->t = duptoeol(p, "label", NULL);
if (strlen(cmd->t) == 0)
error(COMPILE, "empty label");
enterlabel(cmd);
- if (*p == ';') {
- p++;
- goto semicolon;
- }
break;
case SUBST: /* s */
p++;
Option 3:
Index: compile.c
===================================================================
RCS file: /cvs/src/usr.bin/sed/compile.c,v
retrieving revision 1.49
diff -u -p -r1.49 compile.c
--- compile.c 14 Aug 2018 18:10:09 -0000 1.49
+++ compile.c 5 Dec 2018 08:00:11 -0000
@@ -286,8 +286,12 @@ nonsel: /* Now parse the command */
EATSPACE();
if (*p == '\0')
cmd->t = NULL;
- else
+ else {
cmd->t = duptoeol(p, "branch", &p);
+ if (strlen(cmd->t) == 0)
+ free(cmd->t);
+ cmd->t = NULL;
+ }
if (*p == ';') {
p++;
goto semicolon;