Kastus Shchuka writes:
> On Sat, Oct 15, 2022 at 11:42:17PM -0300, Lucas de Sena wrote:
> > Hi,
> > 
> > After trying to split a string into fields delimited with colons and
> > spaces, I found this bug in how ksh(1) does substitution.  The actual
> > behavior contradicts what other shells like bash and mksh do and also
> > contradicts its own manual.
> > 
> > Running the following on other shells (say, bash) prints "/foo/bar/".
> > This command splits the string " foo : bar " into two fields: "foo"
> > and "bar", considering colon and space as delimiters.
> > 
> >     echo " foo : bar " | {
> >             IFS=": "
> >             read -r a b
> >             printf -- "/%s/%s/\n" "$a" "$b"
> >     }
> > 
> > However, running the same command in OpenBSD ksh(1) (or sh(1)) splits
> > the string into "foo" and ": bar".
>
> This is because the last parameter (b) is a concatenation of two fields. 
> Parsing 
> is done properly if you add c to the read command:
>
> + echo  foo : bar 
> + IFS=: 
> + read -r a b c
> + printf -- /%s/%s/%s/\n foo  bar
> /foo//bar/
>
>
> > 
> > The manual ksh(1) provides the following, similar example:
> > 
> > > Example: If IFS is set to “<space>:”, and VAR is set to
> > > “<space>A<space>:<space><space>B::D”, the substitution for $VAR
> > > results in four fields: ‘A’, ‘B’, ‘’ (an empty field), and ‘D’.
> > > Note that if the IFS parameter is set to the NULL string, no field
> > > splitting is done; if the parameter is unset, the default value of
> > > space, tab, and newline is used.
> > 
> > Let's try it:
> > 
> >     echo " A :  B::D" | {
> >             IFS=" :"
> >             read -r arg1 arg2 arg3 arg4
> >             printf -- '1st: "%s"\n' "$arg1"
> >             printf -- '2nd: "%s"\n' "$arg2"
> >             printf -- '3rd: "%s"\n' "$arg3"
> >             printf -- '4th: "%s"\n' "$arg4"
> >     }
> > 
> > bash(1) splits the line into the following fields:
> > 
> >     1st: "A"
> >     2nd: "B"
> >     3rd: ""
> >     4th: "D"
> > 
> > This is actually the expected output, as described in the manual.
> > 
> > However, running the same command in OpenBSD ksh, prints this:
> > 
> >     1st: "A"
> >     2nd: ""
> >     3rd: "B"
> >     4th: ":D"
> > 
> > A completelly different thing.
> > The same occurs with OpenBSD sh(1).
>
> What you observe is the result of the next paragraph in the man page
> after the example you quoted:
>
>      Also, note that the field splitting applies only to the immediate result
>      of the substitution.  Using the previous example, the substitution for
>      $VAR:E results in the fields: `A', `B', `', and `D:E', not `A', `B', `',
>      `D', and `E'.  This behavior is POSIX compliant, but incompatible with
>      some other shell implementations which do field splitting on the word
>      which contained the substitution or use IFS as a general whitespace
>      delimiter.

Actually you need to look further into the manual since the word
splitting is performed not by parameter substitution but by read:

             Reads a line of input from the standard input, separates the line
             into fields using the IFS parameter (see Substitution above), and
             assigns each field to the specified parameters.

This is why adding an extra variable to read above (and here) makes
it capture the remainder of the string.

For example:

        $ alias dump="perl -MData::Dumper -e 'print Dumper @ARGV'"
        $ dump a b c
        $VAR1 = 'a';
        $VAR2 = 'b';
        $VAR3 = 'c';

        $ ( IFS=:; dump $PATH )
        $VAR1 = '/bin';
        $VAR2 = '/sbin';
        $VAR3 = '/usr/bin';
        $VAR4 = '/usr/sbin';
        $VAR5 = '/usr/X11R6/bin';
        $VAR6 = '/usr/local/bin';
        $VAR7 = '/usr/local/sbin';
        $VAR8 = '/usr/games';

So given $X:

        $ X=' A :  B::D'

Parameter substitution:

        $ ( IFS=' :'; dump $X )
        $VAR1 = 'A';
        $VAR2 = 'B';
        $VAR3 = '';
        $VAR4 = 'D';

Similarly:

        $ fn() { dump "$@" ); fn $X
        $VAR1 = 'A';
        $VAR2 = ':';
        $VAR3 = 'B::D';

        $ fn() { dump "$@"; }; ( IFS=' :'; fn $X )
        $VAR1 = 'A';
        $VAR2 = 'B';
        $VAR3 = '';
        $VAR4 = 'D';

read substitution:

        $ echo "$X" | ( IFS=' :'; read a1 a2 a3; dump "$a1" "$a2" "$a3" )
        $VAR1 = 'A';
        $VAR2 = '';
        $VAR3 = 'B::D';

        $ echo "$X" | ( IFS=' :'; read a1 a2 a3 a4; dump "$a1" "$a2" "$a3" 
"$a4" )
        $VAR1 = 'A';
        $VAR2 = '';
        $VAR3 = 'B';
        $VAR4 = ':D';

        $ echo "$X" | ( IFS=' :'; read a1 a2 a3 a4 a5; dump "$a1" "$a2" "$a3" 
"$a4" "$a5" )
        $VAR1 = 'A';
        $VAR2 = '';
        $VAR3 = 'B';
        $VAR4 = '';
        $VAR5 = 'D';

It does look like read, which uses its own expansion routine, has
a bug: a2/VAR2 should be 'B' (or 'B::D') not ''.

Matthew

Reply via email to