On Thu, Mar 30, 2023 at 11:22 AM Kerin Millar <k...@plushkava.net> wrote: > > On Thu, 30 Mar 2023 07:51:59 -0600 > Felipe Contreras <felipe.contre...@gmail.com> wrote: > > > On Thu, Mar 30, 2023 at 5:23 AM Greg Wooledge <g...@wooledge.org> wrote: > > > > > > On Thu, Mar 30, 2023 at 05:12:46AM -0600, Felipe Contreras wrote: > > > > IFS=, > > > > str='foo,bar,,roo,' > > > > printf '"%s"\n' $str > > > > > > > > There is a discrepancy between how this is interpreted between bash > > > > and zsh: in bash the last comma doesn't generate a field and is > > > > ignored, > > > > > > ... which is correct according to POSIX (but not sensible). > > > > > > > in zsh a last empty field is generated. Initially I was going > > > > to report the bug in zsh, until I read what the POSIX specification > > > > says about field splitting [1]. > > > > > > You seem to have misinterpreted whatever you read. > > > > > > https://mywiki.wooledge.org/BashPitfalls#pf47 > > > > > > Unbelievable as it may seem, POSIX requires the treatment of IFS as > > > a field terminator, rather than a field separator. What this means > > > in our example is that if there's an empty field at the end of the > > > input line, it will be discarded: > > > > > > $ IFS=, read -ra fields <<< "a,b," > > > $ declare -p fields > > > declare -a fields='([0]="a" [1]="b")' > > > > > > Where did the empty field go? It was eaten for historical reasons > > > ("because it's always been that way"). This behavior is not unique > > > to bash; all conformant shells do it. > > > > If you think in terms of terminators instead of separators, then the > > above code makes sense because if you add ',' at the end of each field > > (terminate it), you get the original string: > > > > printf '%s,' ${fields[@]} > > > > But you can't replicate 'a,b' that way, because b does not have a > > terminator. Obviously we'll want 'b' as a field, therefore one has to > > assume either 1) the end of the string is considered an implicit > > terminator, or 2) the terminator in the last field is optional. > > Neither of these two things is specified in POSIX. > > > > If we consider 1) the end of the string is considered an implicit > > terminator, then 'a' contains a valid field, but then 'a,' contains > > *two* fields. Making these terminators indistinguishable from > > separators. > > > > We can go for 2) of course, but this is not specified anywhere in > > POSIX, that's just common practice. > > You may find these interesting; the second link in particular.
Indeed. > - https://lists.gnu.org/archive/html/bug-bash/2006-12/msg00033.html > - https://lists.gnu.org/archive/html/bug-bash/2006-12/msg00035.html This says precisely what I said in 1): Chet wrote: > Alternately, you can think of the NUL at the end of the string as an > additional field terminator, Except if you do that, then 'a,' has two fields since the end of the string is an additional field terminator, as I explained. > but one that follows the adjacency rules and doesn't create any empty > fields. So it's a *very special* field terminator that is mentioned nowhere in the POSIX specification. > - http://std.dkuug.dk/JTC1/SC22/WG15/docs/rr/9945-2/9945-2-98.html > > Though I was aware of these behaviours, I do find the POSIX wording to be > unclear as concerns the observations made by the second link, to say the > least. So I'm not the only one who thinks it's unclear. Not to mention the small detail that the Internal Field Separator is not a *separator*, but a terminator (with certain exceptions). -- Felipe Contreras