Glenn, does it help to convert to UTF-8 and then use iconv instead of UTF-32LE and use iconv then, i.e. for \u[] use "integer to UTF-8" and then iconv(UTF-8 to local encoding)?
Olga On Sat, Sep 7, 2013 at 4:48 PM, Glenn Fowler <[email protected]> wrote: > > found the solaris iconv problem in 5 min after sleeping on it > the following command sequences use native commands -- no ast involved > > # u.dat is a UTF-32LE file containing <lower-case-u-umlaut><newline> # > > $ od -tx1 u.dat > 0000000 dc 00 00 00 0a 00 00 00 > 0000010 > > # on linux.i386-64 > $ /usr/bin/iconv -f UTF-32LE -t US-ASCII < u.dat > /usr/bin/iconv: illegal input sequence at position 0 > $ echo $? > 1 > > # on sol11.i386 > $ /bin/iconv -f UTF-32LE -t US-ASCII < u.dat > ? > $ echo $? > 0 > > solaris is *bad* in at least 3 ways > * it apparently detects a conversion error but does not issue a diagnostic > * it apparently detects a conversion error and substitutes '?' for "bad" bytes > * it apparently detects a conversion error but exits 0 > > who know what liberties other implementations may take > > I wonder if ast, in the C/POSIX locale and MB_CUR_MAX==1, should have > strict and non-strict conformance modes > > strict: US-ASCII: characters are 7 bit bytes, bytes with bit 0x80 set are > invalid > non-strict: ISO-8859-1: charcters are 8 bit bytes > > non-strict would match linux C locale behavior > strict would match whose behavior? > > I believe posix gives wiggle room here for the C locale to have chars with > bit 0x80 set > ast in non-strict mode will simply apply that wiggle room constsitenly across > all of its os/arch implementations > > I guess what I'm really saying is that ast *will* be consistent across all > implementations > > the question then is: in the C locale is the ast behavior always strict or is > it tempered > by astconf("COMFORMANCE")? > -- , _ _ , { \/`o;====- Olga Kryzhanovska -====;o`\/ } .----'-/`-/ [email protected] \-`\-'----. `'-..-| / http://twitter.com/fleyta \ |-..-'` /\/\ Solaris/BSD//C/C++ programmer /\/\ `--` `--` _______________________________________________ ast-developers mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-developers
