On 02/19/2011 12:24 AM, Harald Dunkel wrote: > Package: coreutils > Version: 8.10 > > > Hi folks, > > According to the man page "sort -z" and "shuf -z" are supposed to > "end lines with 0 byte, not newline". This doesn't work. Example:
Thanks for the report, however this is not a bug. sort -z implies that both input and output will be handled on NUL-terminator boundaries, instead of the usual newline-terminator boundaries. > > % ( echo 1; echo 2; echo 3 ) | tac | sort -z Up to this point, there are no NUL terminators in the input, so sort sees only a single record, and there's nothing to sort. > | xargs -0 -L 1 echo xxx > xxx 3 > 2 > 1 Likewise, xargs only gets a single record, explaining why you only get a single xxx. > > There are 3 line on input, so there should be 3 lines with "xxx" on > output. No, sort only saw one NUL-terminated line on input (and not even that, since you didn't provide a NUL-terminator). > If I omit the -z, then it works: > > % ( echo 1; echo 2; echo 3 ) | tac | sort | xargs -L 1 echo xxx > xxx 1 > xxx 2 > xxx 3 Of course, because then you are using newline termination. > > Please note that sort's input stream is not zero-terminated. "tac" > doesn't support this option. Maybe we should modify tac to add the -z option. Would you care to write a patch? > sort(1) doesn't mention such an > assumption, either. Obviously there are many more tools with > this restriction. Sort _did_ mention that the effect of -z is to handle lines based on NUL termination - which implies both input and output. Sort does NOT convert between line termination styles, nor should it - since the whole point of NUL-terminated records is that newlines can be embedded within a record (matching the fact that you can sort filenames with embedded newlines). Converting line endings from newline to NUL or from NUL to newline would give ambiguous output from sort's perspective; if you need the conversion, then it should be done before sort's input or after sort's output. > > Instead of mixing input and output options I would suggest to > introduce 2 new tools "nl2zero" and "zero2nl". Sample implementation: > > % alias nl2zero='tr \\n \\0' > % alias zero2nl='tr \\0 \\n' Why should we add new tools, when you've already proven that a new alias or simple shell function using existing tools (tr) can already do what you require? > % ( echo 1; echo 2; echo 3 ) | tac | nl2zero | sort -z | xargs -0 -L 1 echo > xxx > xxx 1 > xxx 2 > xxx 3 If anything, the only thing I've gotten from this post is that it would be nice to teach tac about -z: $ printf '1\0002\0003\000' | tac -z | sort -z | xargs -0 -L 1 echo xxx xxx 1 xxx 2 xxx 3 -- Eric Blake [email protected] +1-801-349-2682 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
