On Thu, 30 Mar 2023 07:51:59 -0600 Felipe Contreras <felipe.contre...@gmail.com> wrote:
> On Thu, Mar 30, 2023 at 5:23 AM Greg Wooledge <g...@wooledge.org> wrote: > > > > On Thu, Mar 30, 2023 at 05:12:46AM -0600, Felipe Contreras wrote: > > > IFS=, > > > str='foo,bar,,roo,' > > > printf '"%s"\n' $str > > > > > > There is a discrepancy between how this is interpreted between bash > > > and zsh: in bash the last comma doesn't generate a field and is > > > ignored, > > > > ... which is correct according to POSIX (but not sensible). > > > > > in zsh a last empty field is generated. Initially I was going > > > to report the bug in zsh, until I read what the POSIX specification > > > says about field splitting [1]. > > > > You seem to have misinterpreted whatever you read. > > > > https://mywiki.wooledge.org/BashPitfalls#pf47 > > > > Unbelievable as it may seem, POSIX requires the treatment of IFS as > > a field terminator, rather than a field separator. What this means > > in our example is that if there's an empty field at the end of the > > input line, it will be discarded: > > > > $ IFS=, read -ra fields <<< "a,b," > > $ declare -p fields > > declare -a fields='([0]="a" [1]="b")' > > > > Where did the empty field go? It was eaten for historical reasons > > ("because it's always been that way"). This behavior is not unique > > to bash; all conformant shells do it. > > If you think in terms of terminators instead of separators, then the > above code makes sense because if you add ',' at the end of each field > (terminate it), you get the original string: > > printf '%s,' ${fields[@]} > > But you can't replicate 'a,b' that way, because b does not have a > terminator. Obviously we'll want 'b' as a field, therefore one has to > assume either 1) the end of the string is considered an implicit > terminator, or 2) the terminator in the last field is optional. > Neither of these two things is specified in POSIX. > > If we consider 1) the end of the string is considered an implicit > terminator, then 'a' contains a valid field, but then 'a,' contains > *two* fields. Making these terminators indistinguishable from > separators. > > We can go for 2) of course, but this is not specified anywhere in > POSIX, that's just common practice. You may find these interesting; the second link in particular. - https://lists.gnu.org/archive/html/bug-bash/2006-12/msg00033.html - https://lists.gnu.org/archive/html/bug-bash/2006-12/msg00035.html - http://std.dkuug.dk/JTC1/SC22/WG15/docs/rr/9945-2/9945-2-98.html Though I was aware of these behaviours, I do find the POSIX wording to be unclear as concerns the observations made by the second link, to say the least. I would add that it is possible to have it both ways, so to speak, though the means of going about it are no less confusing than the topic at large. $ IFS=, $ str="a,b" $ arr=($str""); declare -p arr declare -a arr=([0]="a" [1]="b") $ str="a,b," $ arr=($str""); declare -p arr # duly coercing an empty field that some may expect or wish for declare -a arr=([0]="a" [1]="b" [2]="") -- Kerin Millar