Glenn Fowler wrote:
> On Thu, 17 Jan 2008 01:50:38 +0100 Roland Mainz wrote:
> > Glenn Fowler wrote:
> > > On Tue, 15 Jan 2008 16:57:26 -0800 (PST) Don Cragun wrote:
> > > > ... Note that in the C
> > > > Standard, "character" is a single-byte character.
> > > so absent a standard multibyte interface ast/ksh will stick with
> > > the single byte characters provided by localeconv():
> > >         struct lconv *decimal_point
> > >         struct lconv *thousands_sep
> 
> > But what happens when these data point to multibyte characters (see
> > http://mail.opensolaris.org/pipermail/ksh93-integration-discuss/2008-January/005846.html
> > for an idea to split the arabic locales into one version which uses
> > ASCII characters and a 2nd version which uses the correct arabic
> > (multibyte) characters) ? AFAIK (Don may correct me) it's the author(s)
> > of the locale data which are responsible to define this correctly and
> > the "consumer side" (e.g. libast/ksh93) should just use the strings (and
> > not just the first byte) from struct lconv char* elements...
> 
> if I understand Don correctly the C standard states that
>         struct lconv *decimal_point
>         struct lconv *thousands_sep
> each point to a character, and a character is a one byte quantity
> 
> so it doesn't matter how many bytes are pointed to, only the first
> counts for both decimal_point and thousands_sep

The standard says the behaviour is "unspecified", not "forbidden" (or
"komodo dragons will eat you if you do this"):
-- snip --
"In contexts where standards limit the decimal_point to a single
byte, the result of specifying a multi-byte operand shall be
unspecified." and "In contexts where standards limit the thousands_sep
to a single byte
-- snip --

More interestingly is AFAIK the practical side: Are all data in |struct
lconv| which use |char *| terminated by a '\0' ? If this is "true" for
all platforms it shouldn't be a problem to extend the behaviour from the
existing singlebyte to multibyte characters - AFAIK it's only an
extension of the existing standard and it's up to the authors of the
locale data to use multibyte characters... or not ?

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.mainz at nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)

Reply via email to