I'm trying to use following test to show different sorting behavior
of strcoll() when the locale was set differently
I know that German treats o + umlaut (�) as if it were oe,
(I got this example from http://java.sun.com/applets/jdk/1.1/demo/i18n/Collate)
so,
if current locale is en_US.utf8, then "tofu" < "t�ne"
if current locale is de_DE.utf8, then "tofu" > "t�ne"
because "tofu" > "toene".
here is my simple test code:
#include <iostream.h>
#include <locale.h>
#include <stdlib.h>
#include <langinfo.h>
int main()
{
char tone[] = "toene"; char tofu[] = "tofu";
char lc[20];
cout << "locale:"; cin >> lc;
if (!setlocale(LC_ALL, lc)) {
cerr << "Can't set the specified locale! " << endl; return 1;
}
tone[1] = 0xC3; tone[2] = 0xB6; //overwrite as UTF-8 code of "t�ne"
int utf8_mode = (strcmp(nl_langinfo(CODESET), "utf8") == 0);
cout << "utf8_mode=" << utf8_mode << endl;
cout << "strcoll()=" << strcoll(tone, tofu) << endl;
cout << "mbstowcs()=" << mbstowcs(NULL, tone, 0) << endl;
}
Here is the test result:
% test
locale:en_US.utf8
utf8_mode=1
strcoll(tone, tofu)=1
mbstowcs()=4
% test
locale:de_DE.utf8
utf8_mode=1
strcoll(tone, tofu)=1 //expect -1 !!!!!!!
mbstowcs()=4
% test
locale:C
utf8_mode=0
strcoll(tone, tofu)=84
mbstowcs()=5 //expected
It shows that strcoll() doesn't change at all when I switch
locale between German UTF-8 and English UTF-8.
anyone could give me some hint ? Thanks a lot.
- Re: wrong strcoll() result with different UTF locale ... Changjian_Sun
- Re: wrong strcoll() result with different UTF lo... Markus Kuhn
- Re: wrong strcoll() result with different UTF lo... Changjian_Sun
- Re: wrong strcoll() result with different UTF lo... Pablo Saratxaga
- Re: wrong strcoll() result with different UTF lo... Edmund GRIMLEY EVANS
- Re: wrong strcoll() result with different UTF lo... Changjian_Sun
- Re: wrong strcoll() result with different UTF lo... Jungshik Shin
- Re: wrong strcoll() result with different UTF lo... Pablo Saratxaga
- Re: wrong strcoll() result with different UTF lo... H. Peter Anvin
- Re: wrong strcoll() result with different UTF lo... Simon Josefsson
- Re: wrong strcoll() result with different UTF lo... Markus Kuhn
