On 10/09/2013 03:06 PM, Gabriel Gaster wrote: >> No, because POSIX requires that -n parse as many characters as >> possible regardless of locale, unless you explicitly ask to limit >> the sort to a specific key. > > > That's interesting. Could you perhaps point me to that section (if you > know it off the top of your head)? The POSIX requirement that -n parse > as many characters regardless of locale seems to directly > contradict the other requirement (that at least made sense to me) > that you mentioned earlier that -n parse as many characters until > it sees a non numeric (which is locale dependent).
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/sort.html >> -n >> Restrict the sort key to an initial numeric string, consisting of >> optional <blank> characters, optional minus-sign, and zero or more digits >> with an optional radix character and thousands separators (as defined in the >> current locale), which shall be sorted by arithmetic value. An empty digit >> string shall be treated as zero. Leading zeros and signs on zeros shall not >> affect ordering. >> >> -t char >> Use char as the field separator character; char shall not be considered >> to be part of a field (although it can be included in a sort key). As I read that, I see no limit on length; '-n' (and any other sort key) is free to snarf up characters including the field boundary that separates what is otherwise multiple fields, unless you use -k to state otherwise. In the absence of any other limit, how many characters get snarfed depends on the locale definition of radix character, thousands separator, and any other locale-specific numeric forms. > >> Perhaps less likely to be used in real life, but still apropos to >> the example: >> $ printf '1202\n2011\n' | LC_ALL=C sort --debug -t0 -s -n -k1,1 >> sort: using simple byte comparison 2011 _ 1202 __ >> $ printf '1202\n2011\n' | LC_ALL=C sort --debug -t0 -s -n sort: >> using simple byte comparison 1202 ____ 2011 ____ >> And you'll get the same behavior on Solaris or BSD sort (at least, >> assuming they don't have blatant POSIX compliance bugs). Once you >> understand WHY the above example has two different sorts, based on >> whether -k is used, you'll understand why we can't stop parsing -n >> at a comma even for -t, in a non-C locale. >> > > I understand why the above examples give two different sorts right > now. I just think that, in your example, -t0 should mean that 0 is no longer > a numeric character but a field-separator (regardless of locale) and > therefore that sort should stop on the first line at 2. Admittedly, that might be a nice intuitive meaning; but it's not historically accurate, so POSIX didn't specify it as such - and we can't change it without risking breaking someone that depends on POSIX semantics. Without -k to stop things, the -t0 means that '0' serves as BOTH a separator AND a numeric character - you are sorting on numbers that span multiple fields. The only way to make numeric parsing stop at a field boundary it so use -k to tell sort to stop its key comparison at that boundary (or to add a new option to request something different than POSIX, but we're reluctant to add new options to sort that would be very corner case in their usage). >> Rather, the lack of -k determines how far -n will parse, regardless >> of locale; it's just that some locales let -n parse farther than >> others. > > > Don't you actually mean here that "the lack of -k determines how far -n will > parse, depending on locale." Or even: "the lack of -k has a locale-independent effect of letting -n parse as far as possible; then -n has a locale-dependent effect of how far that actually is". -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
