Glenn, does it help to convert to UTF-8 and then use iconv instead of
UTF-32LE and use iconv then, i.e. for \u[] use "integer to UTF-8" and
then iconv(UTF-8 to local encoding)?

Olga

On Sat, Sep 7, 2013 at 4:48 PM, Glenn Fowler <[email protected]> wrote:
>
> found the solaris iconv problem in 5 min after sleeping on it
> the following command sequences use native commands -- no ast involved
>
> # u.dat is a UTF-32LE file containing <lower-case-u-umlaut><newline> #
>
> $ od -tx1 u.dat
> 0000000 dc 00 00 00 0a 00 00 00
> 0000010
>
> # on linux.i386-64
> $ /usr/bin/iconv -f UTF-32LE -t US-ASCII < u.dat
> /usr/bin/iconv: illegal input sequence at position 0
> $ echo $?
> 1
>
> # on sol11.i386
> $ /bin/iconv -f UTF-32LE -t US-ASCII < u.dat
> ?
> $ echo $?
> 0
>
> solaris is *bad* in at least 3 ways
> * it apparently detects a conversion error but does not issue a diagnostic
> * it apparently detects a conversion error and substitutes '?' for "bad" bytes
> * it apparently detects a conversion error but exits 0
>
> who know what liberties other implementations may take
>
> I wonder if ast, in the C/POSIX locale and MB_CUR_MAX==1, should have
> strict and non-strict conformance modes
>
> strict: US-ASCII: characters are 7 bit bytes, bytes with bit 0x80 set are 
> invalid
> non-strict: ISO-8859-1: charcters are 8 bit bytes
>
> non-strict would match linux C locale behavior
> strict would match whose behavior?
>
> I believe posix gives wiggle room here for the C locale to have chars with 
> bit 0x80 set
> ast in non-strict mode will simply apply that wiggle room constsitenly across
> all of its os/arch implementations
>
> I guess what I'm really saying is that ast *will* be consistent across all 
> implementations
>
> the question then is: in the C locale is the ast behavior always strict or is 
> it tempered
> by astconf("COMFORMANCE")?
>



-- 
      ,   _                                    _   ,
     { \/`o;====-    Olga Kryzhanovska   -====;o`\/ }
.----'-/`-/     [email protected]   \-`\-'----.
 `'-..-| /       http://twitter.com/fleyta     \ |-..-'`
      /\/\     Solaris/BSD//C/C++ programmer   /\/\
      `--`                                      `--`
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to