the example code sequence is the exact sequence used by the new utf32s2wcs()

I don't know if doing the UTF-8 sequence makes solaris behave as expected
and even if it did what does that mean about how utf32s2wcs() is coded
how many implementations have yet more ingenious paths to do the what
the current code should have done?

I'll tweak _ast_iconv to just do the right thing

On Sat, 7 Sep 2013 16:55:02 +0200 =?KOI8-R?B?z8zYx8Egy9LZ1sHOz9fTy8HR?= wrote:
> Glenn, does it help to convert to UTF-8 and then use iconv instead of
> UTF-32LE and use iconv then, i.e. for \u[] use "integer to UTF-8" and
> then iconv(UTF-8 to local encoding)?

> Olga

> On Sat, Sep 7, 2013 at 4:48 PM, Glenn Fowler <[email protected]> wrote:
> >
> > found the solaris iconv problem in 5 min after sleeping on it
> > the following command sequences use native commands -- no ast involved
> >
> > # u.dat is a UTF-32LE file containing <lower-case-u-umlaut><newline> #
> >
> > $ od -tx1 u.dat
> > 0000000 dc 00 00 00 0a 00 00 00
> > 0000010
> >
> > # on linux.i386-64
> > $ /usr/bin/iconv -f UTF-32LE -t US-ASCII < u.dat
> > /usr/bin/iconv: illegal input sequence at position 0
> > $ echo $?
> > 1
> >
> > # on sol11.i386
> > $ /bin/iconv -f UTF-32LE -t US-ASCII < u.dat
> > ?
> > $ echo $?
> > 0
> >
> > solaris is *bad* in at least 3 ways
> > * it apparently detects a conversion error but does not issue a diagnostic
> > * it apparently detects a conversion error and substitutes '?' for "bad" 
> > bytes
> > * it apparently detects a conversion error but exits 0
> >
> > who know what liberties other implementations may take
> >
> > I wonder if ast, in the C/POSIX locale and MB_CUR_MAX==1, should have
> > strict and non-strict conformance modes
> >
> > strict: US-ASCII: characters are 7 bit bytes, bytes with bit 0x80 set are 
> > invalid
> > non-strict: ISO-8859-1: charcters are 8 bit bytes
> >
> > non-strict would match linux C locale behavior
> > strict would match whose behavior?
> >
> > I believe posix gives wiggle room here for the C locale to have chars with 
> > bit 0x80 set
> > ast in non-strict mode will simply apply that wiggle room constsitenly 
> > across
> > all of its os/arch implementations
> >
> > I guess what I'm really saying is that ast *will* be consistent across all 
> > implementations
> >
> > the question then is: in the C locale is the ast behavior always strict or 
> > is it tempered
> > by astconf("COMFORMANCE")?
> >

> -- 
>       ,   _                                    _   ,
>      { \/`o;====-    Olga Kryzhanovska   -====;o`\/ }
> .----'-/`-/     [email protected]   \-`\-'----.
>  `'-..-| /       http://twitter.com/fleyta     \ |-..-'`
>       /\/\     Solaris/BSD//C/C++ programmer   /\/\
>       `--`                                      `--`

_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to