Control: retitle -1 sort -u and uniq "loose" non-identical lines with some locales Control: forwarded -1 https://debbugs.gnu.org/cgi/bugreport.cgi?bug=21916
Hey. I recently stumbled over this issue as well, reported it upstream and got the answer from Pádraig that this is basically not a bug. The point is, AFAIU, that e.g. -u sorts those lines out which are considered equal by collation. Unfortunately Unicode is so crappy defined, that many different things have equal collation. I'd guess that's also the reason for some characters in pt_BR. Now I don't think we should close that bug, even though upstream marked it was wontfix, cause the behaviour is clearly not what people would expect and it may likely lead to more serious data loss. Imagine you write a script that should back files, you first find(1) those files, then sort(1) that for whatever reason with -u,... voilà you already have files no longer on your list. I wrote some ideas upstream, how this may be solved. Cheers, Chris.

