On Wed, Dec 05, 2018 at 09:24:05AM +0100, Martijn van Duren wrote:
> On 12/5/18 8:23 AM, Andreas Kusalananda Kähäri wrote:
> > On Wed, Dec 05, 2018 at 08:09:30AM +0100, Andreas Kusalananda Kähäri wrote:
> >> On Wed, Dec 05, 2018 at 06:14:34AM +0200, Lars Noodén wrote:
> >>> I'm noticing some trouble with branching in sed(1) now.  Leaving the
> >>> label empty should branch to the end of the script:
> >>>
> >>>       [2addr]b [label]
> >>>              Branch to the : function with the specified label.  If the 
> >>> label
> >>>              is not specified, branch to the end of the script.
> >>>
> >>> However, in practice, when I try branching without a label, I get an
> >>> error about an undefined label instead of it branching to the end of
> >>> the script:
> >>>
> >>> $ echo -e "START\nfoo\nbar\nEND\nbaz\n" | sed -n '/^START/,/^END/b;p;'
> >>> sed: 1: "/^START/,/^END/b;p;": undefined label ''
> >>>
> >>> Adding a label works as expected:
> >>>
> >>> $ echo -e "START\nfoo\nbar\nEND\nbaz\n" | sed -n '/^START/,/^END/ba;p;:a;
> >>
> >> No, adding the newlines makes it work.  The label has nothing to do with
> >> it.
> > 
> > Sorry, too early in the morning to be reading code and make a difference
> > between code and data, inserting a label seems to make it work (but
> > I'm unsure why; sure, it's convenient, but why do we have it?)  Still,
> > a portable sed script should have a newline after the "b" (and ":")
> > commands, not a semicolon.
> 
> It seems you're right, although apparently gnu does support the
> semicolon. Note that the label should consist of "portable filename
> character set" characters, so adding the semicolon support doesn't break
> compatibility too bad. Although it is a violation, not an extension on
> unspecified behaviour (only unspecified behaviour is for is for
> s/../../w).
> 
> So our options are:
> 1) Be extremely pedantic and check everything is within POSIX spec.

This would get my vote as well.

> 2) Remove support for semicolon and be more in line with POSIX. This
>    way we get the semicolon as a label-character for free and removes
>    the most LoC.

Although the standard, as far as I can see right now, says nothing
about what characters are supposed to be valid in a label (only that an
implementation should support labels of length 8, at least), it feels
really weird to have semicolon as a label character...

> 3) Keep violating POSIX, but make the behaviour consistent, similar to
>    what gnu sed does.

I'm generally opposed to extra conveniences like these, especially if
they aren't actually needed for any other reason than to cater for
people who grew up on GNU systems.  However, a consistent behaviour
would definitely be good; either "b;" *and* "blabel;" (and ":label;")
works (the way GNU does it), or they generate some diagnostic (as with
Option 1).

The merit of the bug report is that it points out that the current
behaviour appears to be inconsistent.

Regards,
Andreas

> 
> Personally I would prefer option 1, since that would help write portable
> scripts, but probably breaks a lot. Option 2 will probably also break a
> few things, so I bet people will vote for option 3.
> 
> martijn@
> > 
> >>
> >> The label (empty or not) has to be delimited by a newline.  In your
> >> first script, you could also have used
> >>
> >>     sed -n -e '/^START/,/^END/b' -e p
> >>
> >> (each -e inserts a newline in the script), or more simply
> >>
> >>     sed '/^START/,/^END/d'
> >>
> >> This is AFAIK standard behaviour.
> >>
> >> >From POSIX:
> >>
> >>     Command verbs other than {, a, b, c, i, r, t, w, :, and # can be
> >>     followed by a <semicolon>, optional <blank> characters, and
> >>     another command verb.
> >>
> >>
> >> Andreas
> >>
> >>>
> >>> If I have not made a mistake with the short script above then there
> >>> seems to be a discrepancy between the behavior described in the manual
> >>> and the actual behavior.
> >>>
> >>> dmesg below
> >>> /Lars
> >>>
> > [cut]
> > 
> Option 1:
> Index: compile.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/sed/compile.c,v
> retrieving revision 1.49
> diff -u -p -r1.49 compile.c
> --- compile.c 14 Aug 2018 18:10:09 -0000      1.49
> +++ compile.c 5 Dec 2018 08:10:35 -0000
> @@ -73,6 +73,7 @@ static struct s_command
>                *findlabel(char *);
>  static void    fixuplabel(struct s_command *, struct s_command *);
>  static void    uselabel(void);
> +static void    validlabel(const char *);
>  
>  /*
>   * Command specification.  This is used to drive the command parser.
> @@ -287,23 +288,17 @@ nonsel:         /* Now parse the command */
>                       if (*p == '\0')
>                               cmd->t = NULL;
>                       else
> -                             cmd->t = duptoeol(p, "branch", &p);
> -                     if (*p == ';') {
> -                             p++;
> -                             goto semicolon;
> -                     }
> +                             cmd->t = duptoeol(p, "branch", NULL);
> +                     validlabel(cmd->t);
>                       break;
>               case LABEL:                     /* : */
>                       p++;
>                       EATSPACE();
> -                     cmd->t = duptoeol(p, "label", &p);
> +                     cmd->t = duptoeol(p, "label", NULL);
> +                     validlabel(cmd->t);
>                       if (strlen(cmd->t) == 0)
>                               error(COMPILE, "empty label");
>                       enterlabel(cmd);
> -                     if (*p == ';') {
> -                             p++;
> -                             goto semicolon;
> -                     }
>                       break;
>               case SUBST:                     /* s */
>                       p++;
> @@ -873,4 +868,18 @@ uselabel(void)
>                       free(lh);
>               }
>       }
> +}
> +
> +static void
> +validlabel(const char *name)
> +{
> +     size_t i, len;
> +     if ((len = strlen(name)) > 8)
> +             error(COMPILE, "label too long");
> +     for (i = 0; i < len; i++) {
> +             if (!isalnum(name[i]) && name[i] != '.' && name[i] != '_'
> +                 && name[i] != '-')
> +                     error(COMPILE, "Invalid label character");
> +     }
> +     
>  }
> 
> Option 2:
> Index: compile.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/sed/compile.c,v
> retrieving revision 1.49
> diff -u -p -r1.49 compile.c
> --- compile.c 14 Aug 2018 18:10:09 -0000      1.49
> +++ compile.c 5 Dec 2018 07:56:59 -0000
> @@ -287,23 +287,15 @@ nonsel:         /* Now parse the command */
>                       if (*p == '\0')
>                               cmd->t = NULL;
>                       else
> -                             cmd->t = duptoeol(p, "branch", &p);
> -                     if (*p == ';') {
> -                             p++;
> -                             goto semicolon;
> -                     }
> +                             cmd->t = duptoeol(p, "branch", NULL);
>                       break;
>               case LABEL:                     /* : */
>                       p++;
>                       EATSPACE();
> -                     cmd->t = duptoeol(p, "label", &p);
> +                     cmd->t = duptoeol(p, "label", NULL);
>                       if (strlen(cmd->t) == 0)
>                               error(COMPILE, "empty label");
>                       enterlabel(cmd);
> -                     if (*p == ';') {
> -                             p++;
> -                             goto semicolon;
> -                     }
>                       break;
>               case SUBST:                     /* s */
>                       p++;
> Option 3:
> Index: compile.c
> ===================================================================
> RCS file: /cvs/src/usr.bin/sed/compile.c,v
> retrieving revision 1.49
> diff -u -p -r1.49 compile.c
> --- compile.c 14 Aug 2018 18:10:09 -0000      1.49
> +++ compile.c 5 Dec 2018 08:00:11 -0000
> @@ -286,8 +286,12 @@ nonsel:          /* Now parse the command */
>                       EATSPACE();
>                       if (*p == '\0')
>                               cmd->t = NULL;
> -                     else
> +                     else {
>                               cmd->t = duptoeol(p, "branch", &p);
> +                             if (strlen(cmd->t) == 0)
> +                                     free(cmd->t);
> +                             cmd->t = NULL;
> +                     }
>                       if (*p == ';') {
>                               p++;
>                               goto semicolon;

-- 
Andreas Kusalananda Kähäri,
National Bioinformatics Infrastructure Sweden (NBIS),
Uppsala University, Sweden.

Reply via email to