Control: retitle -1 sort -u and uniq "loose" non-identical lines with some 
locales
Control: forwarded -1 https://debbugs.gnu.org/cgi/bugreport.cgi?bug=21916

Hey.

I recently stumbled over this issue as well, reported it upstream and
got the answer from Pádraig that this is basically not a bug.

The point is, AFAIU, that e.g. -u sorts those lines out which are
considered equal by collation.

Unfortunately Unicode is so crappy defined, that many different things
have equal collation.
I'd guess that's also the reason for some characters in pt_BR.


Now I don't think we should close that bug, even though upstream marked
it was wontfix, cause the behaviour is clearly not what people would
expect and it may likely lead to more serious data loss.
Imagine you write a script that should back files, you first find(1)
those files, then sort(1) that for whatever reason with -u,... voilà
you already have files no longer on your list.

I wrote some ideas upstream, how this may be solved.


Cheers,
Chris.

Reply via email to