found the solaris iconv problem in 5 min after sleeping on it
the following command sequences use native commands -- no ast involved
# u.dat is a UTF-32LE file containing <lower-case-u-umlaut><newline> #
$ od -tx1 u.dat
0000000 dc 00 00 00 0a 00 00 00
0000010
# on linux.i386-64
$ /usr/bin/iconv -f UTF-32LE -t US-ASCII < u.dat
/usr/bin/iconv: illegal input sequence at position 0
$ echo $?
1
# on sol11.i386
$ /bin/iconv -f UTF-32LE -t US-ASCII < u.dat
?
$ echo $?
0
solaris is *bad* in at least 3 ways
* it apparently detects a conversion error but does not issue a diagnostic
* it apparently detects a conversion error and substitutes '?' for "bad" bytes
* it apparently detects a conversion error but exits 0
who know what liberties other implementations may take
I wonder if ast, in the C/POSIX locale and MB_CUR_MAX==1, should have
strict and non-strict conformance modes
strict: US-ASCII: characters are 7 bit bytes, bytes with bit 0x80 set are
invalid
non-strict: ISO-8859-1: charcters are 8 bit bytes
non-strict would match linux C locale behavior
strict would match whose behavior?
I believe posix gives wiggle room here for the C locale to have chars with bit
0x80 set
ast in non-strict mode will simply apply that wiggle room constsitenly across
all of its os/arch implementations
I guess what I'm really saying is that ast *will* be consistent across all
implementations
the question then is: in the C locale is the ast behavior always strict or is
it tempered
by astconf("COMFORMANCE")?
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers