On Mon, Apr 26, 2021 at 12:26 PM Geoff Clare via austin-group-l at The
Open Group <austin-group-l@opengroup.org> wrote:
>
> Oğuz wrote, on 25 Apr 2021:
> >
> > On Sat, Apr 24, 2021 at 6:22 PM Austin Group Bug Tracker via
> > austin-group-l at The Open Group <austin-group-l@opengroup.org> wrote:
> > > $ echo 'x,,z' | awk -F'[^,]*' '{for (i=1;i<=NF;i++) print i, "<"$i">"}'
> > > 1 <>
> > > 2 <,,>
> > > 3 <>
> >
> > This seems rather like an implementation bug. Although mawk and nawk
> > agree with gawk on how that case should be handled, there really is no
> > reason for `,,' to be a single field there. And if you replace the
> > asterisk with its interval equivalent, there is no consensus on how
> > that should work among existing implementations (the ones I have
> > access to, at least).
> >
> > $ echo 'x,,z' | gawk -F'[^,]{0,}' '{for (i=1;i<=NF;i++) print i, "<"$i">"}'
> > 1 <>
> > 2 <,,>
> > 3 <>
> > $
> > $ echo 'x,,z' | mawk -F'[^,]{0,}' '{for (i=1;i<=NF;i++) print i, "<"$i">"}'
> > 1 <x,,z>
> > $
> > $ echo 'x,,z' | nawk -F'[^,]{0,}' '{for (i=1;i<=NF;i++) print i, "<"$i">"}'
> > 1 <x,,z>
> > $
>
> The mawk and nawk behaviour is clearly just wrong here.  Fields should
> never contain (non-empty) parts that match FS.
>
> > $ echo 'x,,z' | busybox awk -F'[^,]{0,}' '{for (i=1;i<=NF;i++) print
> > i, "<"$i">"}'
> > 1 <>
> > 2 <,>
> > 3 <,>
> > 4 <>
> > 5 <>
>
> Busybox awk also does this with -F'[^,]*'
>
> > And the expected output is as follows.
> >
> > 1 <>
> > 2 <,>
> > 3 <,>
> > 4 <>
> >
> > So either the standard should make the behavior unspecified when `FS'
> > is an ERE that would match a zero-length string, or implementations
> > should fix these bugs.
>
> I think there are two reasonable choices: either require that FS
> does not match zero-length strings (as in the desired action in the
> bug), or make it unspecified whether or not it matches zero-length
> strings. If we do the latter, busybox would have a minor bug because
> of the extra field, and the mawk and nawk behaviour with {0,} would
> still not conform.

Apparently the majority of implementations behave the way gawk does or
try to do so, and it makes sense too. There really aren't many use
cases where busybox awk behavior would be helpful. Upon reflection, I
think the former choice, requiring FS to not match zero-length
strings, would be the best.

>
> --
> Geoff Clare <g.cl...@opengroup.org>
> The Open Group, Apex Plaza, Forbury Road, Reading, RG1 1AX, England
>

  • [1003.1(2008... Austin Group Bug Tracker via austin-group-l at The Open Group
    • Re: [10... Oğuz via austin-group-l at The Open Group
      • Re:... Geoff Clare via austin-group-l at The Open Group
        • ... Oğuz via austin-group-l at The Open Group
    • [1003.1... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [1003.1... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [1003.1... Austin Group Bug Tracker via austin-group-l at The Open Group

Reply via email to