[ksh93-integration-discuss] [i18n-discuss] ksh93 "printf" builtin vs. CR #6558816("printf variants behaving incorrectly for multibyte decimal point") ...

Roland Mainz Thu, 17 Jan 2008 02:09:05 +0100

Ienup Sung wrote:
> Roland Mainz wrote at 01/14/08 15:52:
> > BTW: Is it a bug that $ LC_ALL=ar_SA.UTF-8 locale -k decimal_point #
> > returns a comma (',' ) ?
> 
> Using comma (',' or 0x2c) as the decimal_point value of ar_SA.UTF-8 isn't
> correct but rather it should be either Arabic decimal separator (U+066B in
> Unicode) or, in my opinion, period ('.' or 0x2e) as in "out of no proper
> single byte character", for the use with ASCII digits, or both.
> 
> I though can guess why there is comma in the current locale definition:
> 
> Majority of Arabic speaking countries at Middle East, the proper decimal
> point with Arabic-Indic digits would be ARABIC DECIMAL SEPARATOR U+066B and
> the proper thousand separator would be ARABIC THOUSANDS SEPARATOR U+066C.
> 
> The U+066B looks similar to ASCII comma but the two are completely different
> characters. The same also goes to the U+066C and ASCII apostrophe.
> 
> I *think* the locale owner,


Who is the locale owner ?

> experiencing problems mentioned in the noted
> CR 6558816 and other bugs

What are the CR #-numbers of the other bugs ?

> after the recent change of the decimal_point and
> thousands_sep values from ASCII period and ASCII comma to U+066B and U+066C,

Slightly offtopic: It seems that CDE's dtterm default font in the
en_US.UTF-8 locale doesn't have a glyph available for U+066C ... ;-(

> somehow figured that using ASCII comma for decimal_point and ASCII
> apostrophe for thousands_sep might be better (since possibly they look
> similar) and resorted into that compromise.
> 
> To me, probably the best compromise might have been having ASCII period for
> decimal_point and an empty string for thousands_sep.

What about extending this compromise (refining my proposal from
http://mail.opensolaris.org/pipermail/ksh93-integration-discuss/2008-January/005846.html
a bit) a bit (AFAIK the idea of multibyte charcters for { decimal_point,
thousands_se, etc } sounds intesting but I can also see the issues that
non-multibyte aware applications won't like this)):
1) Create "ar_SA.UTF-8 at ascii_numeric" (uses ASCII characters for
|decimal_point| and |thousands_sep|)
2) Create "ar_SA.UTF-8 at arabic_numeric" (uses multibyte characters (to
represent these characters as arabic (multibyte) characters) for
|decimal_point| and |thousands_sep|)
3) Make "ar_SA.UTF-8" an alias to "ar_SA.UTF-8 at ascii_numeric" for now

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.mainz at nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)

[ksh93-integration-discuss] [i18n-discuss] ksh93 "printf" builtin vs. CR #6558816("printf variants behaving incorrectly for multibyte decimal point") ...

Reply via email to