Re: [I18n]Please do not use en_US.UTF-8 outside the US
JS I had to make up ko_KR.UTF-8 different from en_US.UTF-8 to make my JS transition to ko_KR.UTF-8 work as I intended. Fair point. Of course, the long-term solution is to use font technologies that do language-dependent and contextual font and glyph substitution. Client- or server-side. Juliusz ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]Please do not use en_US.UTF-8 outside the US
How are you, Markus, I just would like to point out that we never suggested that the en_US.UTF-8 is the only locale that you will ever need. On the contrary, we've been pointing out that each region/country should use their own Unicode locales. Yes, it is absolutely right that globalization isn't just encoding or coded character set; Unicode itself alone cannot resolve everything/issues even though it is absolutely a good thing to have a universal character set widely accepted like Unicode. With regards, Ienup ] Date: Tue, 30 Apr 2002 21:32:39 +0100 ] From: Markus Kuhn [EMAIL PROTECTED] ] Subject: [I18n]Please do not use en_US.UTF-8 outside the US ] To: [EMAIL PROTECTED] ] Cc: [EMAIL PROTECTED] ] MIME-version: 1.0 ] ] As we are talking about en_US.UTF-8: ] ] General warning: Please do not use the locale name en_US.UTF-8 anywhere ] outside North America. Some older Solaris documentation suggested that ] this is the only UTF-8 locale you'll ever need, as locales don't change ] much sensible beyond the encoding anyway. This is not the case any more ] today! ] ] An increasing number of programs of US origin finally start to abandon ] the annoying old habit of assuming Legal paper and non-metric units as ] default conventions everywhere, requiring 95% of the world population to ] figure out how to reconfigure to the standard conventions. ] ] More recent software releases instead determine the default setting for ] conventions such as paper format and units of measurement with code ] similar to the following (feel free to copy it into your software as ] well): ] ] ] #include stdio.h ] #include stdlib.h ] #include string.h ] ] /* LC_PAPER and LC_MEASUREMENT were introduced in ISO/IEC TR 14652 */ ] ] int main() ] { ] char *units = mm; ] char *paper = A4; ] char *s; ] ] if (((s = getenv(LC_ALL))*s) || ] ((s = getenv(LC_PAPER)) *s) || ] ((s = getenv(LANG)) *s)) ] if (strstr(s, _US) || strstr(s, _CA)) ] paper = Letter; ] if (((s = getenv(LC_ALL))*s) || ] ((s = getenv(LC_MEASUREMENT)) *s) || ] ((s = getenv(LANG)) *s)) ] if (strstr(s, _US)) ] units = inches; ] ] printf(Paper: %s\nUnits: %s\n, paper, units); ] ] return 0; ] } ] ] ] This leads to portable and agreeable default settings, using the ] standard values UNLESS you are in a locale that explicitely says that ] you are in North America. I think that's a very good implementation ] practice, but it requires that if you explain to an international ] audience how to activate UTF-8 locales, you should better use a non-US/ ] CA locale. (en_GB.UTF-8 for instance seems like an excellent choice ... :) ] ] Markus ] ] -- ] Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK ] Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ] ] ___ ] I18n mailing list ] [EMAIL PROTECTED] ] http://XFree86.Org/mailman/listinfo/i18n ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
[I18n]Please do not use en_US.UTF-8 outside the US
As we are talking about en_US.UTF-8: General warning: Please do not use the locale name en_US.UTF-8 anywhere outside North America. Some older Solaris documentation suggested that this is the only UTF-8 locale you'll ever need, as locales don't change much sensible beyond the encoding anyway. This is not the case any more today! An increasing number of programs of US origin finally start to abandon the annoying old habit of assuming Legal paper and non-metric units as default conventions everywhere, requiring 95% of the world population to figure out how to reconfigure to the standard conventions. More recent software releases instead determine the default setting for conventions such as paper format and units of measurement with code similar to the following (feel free to copy it into your software as well): #include stdio.h #include stdlib.h #include string.h /* LC_PAPER and LC_MEASUREMENT were introduced in ISO/IEC TR 14652 */ int main() { char *units = mm; char *paper = A4; char *s; if (((s = getenv(LC_ALL))*s) || ((s = getenv(LC_PAPER)) *s) || ((s = getenv(LANG)) *s)) if (strstr(s, _US) || strstr(s, _CA)) paper = Letter; if (((s = getenv(LC_ALL))*s) || ((s = getenv(LC_MEASUREMENT)) *s) || ((s = getenv(LANG)) *s)) if (strstr(s, _US)) units = inches; printf(Paper: %s\nUnits: %s\n, paper, units); return 0; } This leads to portable and agreeable default settings, using the standard values UNLESS you are in a locale that explicitely says that you are in North America. I think that's a very good implementation practice, but it requires that if you explain to an international audience how to activate UTF-8 locales, you should better use a non-US/ CA locale. (en_GB.UTF-8 for instance seems like an excellent choice ... :) Markus -- Markus G. Kuhn, Computer Laboratory, University of Cambridge, UK Email: mkuhn at acm.org, WWW: http://www.cl.cam.ac.uk/~mgk25/ ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n
Re: [I18n]Please do not use en_US.UTF-8 outside the US
On Tue, 30 Apr 2002, Dr Andrew C Aitchison wrote: On Tue, 30 Apr 2002, Markus Kuhn wrote: As we are talking about en_US.UTF-8: General warning: Please do not use the locale name en_US.UTF-8 anywhere outside North America. practice, but it requires that if you explain to an international audience how to activate UTF-8 locales, you should better use a non-US/ CA locale. (en_GB.UTF-8 for instance seems like an excellent choice ... :) % find xc -name *UTF-8* -print xc/nls/Compose/en_US.UTF-8.ct Given that en_US.UTF-8 is the only instance of a locale file with UTF-8 in its name, how do I find the names of other locales which use UTF-8 ? Have you looked into the Glibc locale directory? Mandrake has a bunch of UTF-8 locales there, I believe. Glibc 2.2.x has been supporting ll_CC.UTF-8's for a while. If your system doesn't have it, you can just generate whatever ll_CC.UTF-8's you may need with localedef. As for XLC_LOCALE, you can always make one as I wrote in my message yesterday. RedHat and Mandrake Linux may not have XLC_LOCALES for locales other than en_US.UTF-8, but some other Linux distributions (e.g. TurboLinux) have zh_CN.UTF-8 and zh_TW.UTF-8. BTW, the first UTF-8 locale other than en_US.UTF-8 shipped with Solaris - Solaris 7? - (and AIX 4.x as well) was ko_KR.UTF-8, IIRC. a bit off-topic Now I'm almost done with switching to ko_KR.UTF-8 on my Linux box. It works more or less fine in that I can do *more than* what I could do under ko_KR.EUC-KR. Still missing is Middle Korean support, but it seems that xterm-16x can be used to *display* Middle Korean text encoded with a sequence of U+1100 Hangul Conjoining Jamos (http://chem.skku.ac.kr/~wkpark/screenshot/2002_04_30_221718_shot.png). Vim 6.1 already supports up to two combining characters and Middle Korean only need 'two combining characters' *most of time*. (even modern Korean needs more than two 'combining characters' in some cases,though. http://jshin.net/i18n/uyeo.html). Hopefully, with a little more tweaking in Vim 6.1 and some major enhancements in Korean XIM (e.g. Ami), I'll be able to typeset Middle Korean with LaTeX sooner or later. (LaTeX side is almost ready, too) /a bit off-topic Jungshik Shin ___ I18n mailing list [EMAIL PROTECTED] http://XFree86.Org/mailman/listinfo/i18n