On Tuesday, October 8, 2013 at 8:48 PM, Eric Blake wrote: >
> > the question in my mind remains: if a user specifies a > > field-separator shouldn't that override the locale? > > > > No, because POSIX requires that -n parse as many characters as > possible regardless of locale, unless you explicitly ask to limit > the sort to a specific key. That's interesting. Could you perhaps point me to that section (if you know it off the top of your head)? The POSIX requirement that -n parse as many characters regardless of locale seems to directly contradict the other requirement (that at least made sense to me) that you mentioned earlier that -n parse as many characters until it sees a non numeric (which is locale dependent). > Perhaps less likely to be used in real life, but still apropos to > the example: > $ printf '1202\n2011\n' | LC_ALL=C sort --debug -t0 -s -n -k1,1 > sort: using simple byte comparison 2011 _ 1202 __ > $ printf '1202\n2011\n' | LC_ALL=C sort --debug -t0 -s -n sort: > using simple byte comparison 1202 ____ 2011 ____ > And you'll get the same behavior on Solaris or BSD sort (at least, > assuming they don't have blatant POSIX compliance bugs). Once you > understand WHY the above example has two different sorts, based on > whether -k is used, you'll understand why we can't stop parsing -n > at a comma even for -t, in a non-C locale. > I understand why the above examples give two different sorts right now. I just think that, in your example, -t0 should mean that 0 is no longer a numeric character but a field-separator (regardless of locale) and therefore that sort should stop on the first line at 2. In other words, sort -t0 -n should output '2011\n1202' since 2 is smaller than 12. It seems that the current rationale is to have the locale override user specified field-separators, and to then have some other POSIX requirement (that sort -n take as much as possible, regardless of locales and depending on locales), overiding locales sometimes. > > > It seems that the locale overrides specific arguments to sort (in > > this case, field-separator=, ). > > > > Rather, the lack of -k determines how far -n will parse, regardless > of locale; it's just that some locales let -n parse farther than > others. > -- Eric Blake eblake redhat com +1-919-301-3266 Libvirt > virtualization library http://libvirt.org Don't you actually mean here that "the lack of -k determines how far -n will parse, depending on locale."
