Bruno Haible wrote:Alex J. Dam wrote: > $ echo 'ABÇ' | tr [:upper:] [:lower:] > abÇ > (the last character is an uppercase cedilla) > I expecte its output to be: > abç
What does 'locale' say in this case?
$ locale LANG=pt_BR.UTF-8 LC_CTYPE="pt_BR.UTF-8" LC_NUMERIC="pt_BR.UTF-8" LC_TIME="pt_BR.UTF-8" LC_COLLATE="pt_BR.UTF-8" LC_MONETARY="pt_BR.UTF-8" LC_MESSAGES="pt_BR.UTF-8" LC_PAPER="pt_BR.UTF-8" LC_NAME="pt_BR.UTF-8" LC_ADDRESS="pt_BR.UTF-8" LC_TELEPHONE="pt_BR.UTF-8" LC_MEASUREMENT="pt_BR.UTF-8" LC_IDENTIFICATION="pt_BR.UTF-8" LC_ALL=pt_BR.UTF-8 $ echo 'ABÇ' | tr [:upper:] [:lower:] abÇ
But sed and tr and other utilities just use the locale data provided on the system by glibc among other places. These programs are table driven by tables that are not part of these programs. This is why locale problems are global problems across the entire system of programs such as grep, sed, awk, tr, etc. or anything else that uses the locale data.
I tried it with different locales, all of them show the same results. Looking at sed 4.0.7 source code, execeute.c:
/* Now do the required modifications. First \[lu]... */ if (type & repl_uppercase_first) { *start = toupper(*start); start++; type &= ~repl_uppercase_first; }
I'm not a Linux C programmer. start was declared as "char". sed uses toupper, not towupper. Does this have something to do with its behaviour?
I typed a simple program:
#include <string.h> #include <locale.h> #include <stdio.h> int main(){ setlocale(LC_ALL, "pt_BR.UTF-8"); int x; for(x = 0; x <= 255; x++){ int y = towupper(x); if(x != y) printf("%u -> %u *\n", x, y); else printf("%u -> %u\n", x, y); } }
In its output, the line 199 -> 231 * appears.
Ok, as I said above, I am NOT a Linux programmer and this could be nonsense.
Alex
_______________________________________________ Bug-coreutils mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/bug-coreutils