Re: sed(1) not branching to the end of the script

Jason McIntyre Fri, 07 Dec 2018 09:39:47 -0800

On Fri, Dec 07, 2018 at 06:23:35PM +0100, Ingo Schwarze wrote:
> Hi Martijn,
> 
> Martijn van Duren wrote on Thu, Dec 06, 2018 at 07:07:14AM +0100:
> > On 12/5/18 7:24 PM, Ingo Schwarze wrote:
> 
> >> putting the minimal useful example in the place of longer quotations:
> >> 
> >>    $ printf "A\nB\n" | gsed '1b;='
> >>   A
> >>   2
> >>   B
> >>    $ printf "A\nB\n" | sed '1b;='  
> >>   sed: 1: "1b;=": undefined label ''
> 
> >> Martijn van Duren wrote on Wed, Dec 05, 2018 at 09:24:05AM +0100:
> 
> >>> Note that the label should consist of "portable filename
> >>> character set" characters, so adding the semicolon support doesn't break
> >>> compatibility too bad. Although it is a violation, not an extension on
> >>> unspecified behaviour (only unspecified behaviour is for is for
> >>> s/../../w).
> 
> >> Why do you think it is a violation?
> 
> > Because POSIX goes out of its way to make it not obvious:
> > Editing commands other than {...}, a, b, c, i, r, t, w, :, and # can be 
> > followed by a <semicolon>, optional <blank> characters, and another 
> > editing command. However, when an s editing command is used with the w  
> > flag, following it with another command in this manner produces 
> > undefined results.
> > 
> > They begin by a negation which can use a semicolon and then they follow
> > by explicitly stating where undefined behaviour lies. So assuming that
> > not including in "can" equals "may still", and assuming that the
> > undefined results section is a non-exhaustive list, or an exclusive for
> > the inverse group mentioned at the star, may result in undefined
> > behaviour. But combine the obscure language with the fact that there's a
> > profound reason to not use a semicolon in 6 out of the 10 exclude group
> > makes me wonder if it's not a violation why they went out of their way
> > to place them in the same list as a, c, i, r, w, #.
> 
> Ah.  I think when reading a standard, one must carefully look what it
> actually says, not jump to conclusions from how that is said.  Even when
> logically unambiguous, the wording may sometimes sound confusing.
> And sometimes, what is prescribed is unambiguous, but something else
> would seem to make more sense.
> 
> No doubt it says what a semicolon is supposed to do after the
> commands not listed.  No doubt it says that "s///w filename;something"
> results in undefined behaviour - by the way, "undefined" is stronger
> than "unspecified".  But i don't see that it says anywhere
> what "b label;something" is supposed to do - so that is left
> unspecified, and operating systems are free to implement and
> document an extension.
> 
> By the way, we do have a case here of the specification looking
> slightly ill-designed: "s///w filename;something" is explicitly
> marked as undefined, whereas the even simpler "w filename;something"
> is merely left unspecified.  But fortunately, we are not planning
> to change the behaviour of "[s///]w filename;something", so we don't
> need to worry about that right now.
> 
> 
> All that said, i see a few problems with the manual page, so here is
> a patch to fix it.
> 
> The information in the CAVEATS section is misplaced.  The purpose
> of that section is to warn about typical programming mistakes, not
> to explain what our implementation does nor to explain what the
> standard requires.  Besides, it is wrong, semicolons *can* be used
> after "b" and "t" with our implementation.  Finally, the current
> wording can mislead people to think this might be forbidden:
> 
>   $ echo "A\nB" | sed '=;r suffix.txt'
> 
> 
> So move the information about "a", "c", "i", "r", and "w" to the
> DESCRIPTION.  I don't think it belongs into the second paragraph
> from the top; even though that is where ";" is introduced, that
> place would be way too prominent.  Below "SED FUNCTIONS", where
> other special properties of groups of functions are also explained,
> seems about right.
> 
> Move the information about "b", "t", and ":" to STANDARDS where it
> belongs.  That commands in general can be separated with ";" was
> already said at the very top of the page.
> 
> I don't think anything more needs to be said about "#".
> We already have:
> 
>     The '#' and the remainder of the line are ignored (treated as a
>     comment), with the single exception that if the first two
>     characters in the file are '#n', the default output is
>     [...]
> 
> It's kind of obvious the remainder of the line may contain ';'
> and it will be ignored.
> 
> While here, avoid "permitted" - were aren't planning to send anybody
> to jail for sed(1) abuse.
> 
> OK?
>   Ingo
>


hi.

it reads ok to me. but just to note - there is nothing wrong with the
way "permitted" is used in that text.

jmc

> 
> Index: sed.1
> ===================================================================
> RCS file: /cvs/src/usr.bin/sed/sed.1,v
> retrieving revision 1.57
> diff -u -r1.57 sed.1
> --- sed.1     14 Nov 2018 10:59:33 -0000      1.57
> +++ sed.1     7 Dec 2018 16:48:14 -0000
> @@ -277,6 +277,20 @@
>  The synopses below indicate which arguments have to be separated from
>  the function letters by whitespace characters.
>  .Pp
> +The
> +.Ic a ,
> +.Ic c ,
> +.Ic i ,
> +.Ic r ,
> +and
> +.Ic w
> +functions cannot be followed by another command separated with a semicolon.
> +The
> +.Ar text
> +and
> +.Ar file
> +arguments may contain semicolon characters.
> +.Pp
>  Functions can be combined to form a
>  .Em function list ,
>  a list of
> @@ -561,6 +575,14 @@
>  .Op Fl aEiru
>  are extensions to that specification.
>  .Pp
> +Following the
> +.Ic b ,
> +.Ic t ,
> +or
> +.Ic \&:
> +commands with a semicolon and another command is an extension to the
> +specification.
> +.Pp
>  The use of newlines to separate multiple commands on the command line
>  is non-portable;
>  the use of newlines to separate multiple commands within a command file
> @@ -571,11 +593,3 @@
>  .Nm
>  command appeared in
>  .At v7 .
> -.Sh CAVEATS
> -The use of semicolons to separate multiple commands
> -is not permitted for the following commands:
> -.Ic a , b , c ,
> -.Ic i , r , t ,
> -.Ic w , \&: ,
> -and
> -.Ic # .
>

Re: sed(1) not branching to the end of the script

Reply via email to