On 04/08/2013 08:00 AM, Eric Blake wrote: > On 04/08/2013 01:27 AM, Ray Dillinger wrote:
It turns out that 'sort' is grabbing locale information now and doing a locale-aware sort.
> Yes, this behavior has been required by POSIX for more than >20 years, Really. Hm. It wasn't that long ago. Oh, wait, I know what this is. We weren't using locales other than the 'C' locale on our servers until we needed a UTF-8 locale to handle non-English text, so we made that change three and a half years ago. Okay, at least now I know when it broke and how much archived data has to be reprocessed. ... Yikes... That's going to be about a solid 300 days of CPU time by the time the reprocessing is done and the data miner gets through it. Figure all-night runs on about half our server cluster for about a month before we'll be caught up, plus a hard pull on our offsite backups.
There is a workaround; one can set the locale to 'C' or 'POSIX' directly in a script (or at the shell prompt) and then set it back after calling 'sort'.
> That is not just a workaround, but the POSIX-mandated > way to get sane sorting results. Script writers have been > doing this for years. Sigh. Well, I'm going to reiterate that it's ugly, moves a lot of unnecessary boilerplate into scripts, can fail in too many ways, and can cause secondary failures. Nice to see you acknowledging it as "sane results" though. Still, if your minds are made up about not doing this, I guess the other points you make about procedure are not relevant to this issue. Good to know for other stuff later though. Thanks for considering it. Ray
