Glenn, I think I found a bug in either GNU (iconv (GNU libc) 2.15) or
AST iconv (iconv (AT&T Research) 2011-01-11).
I'm still not sure who's to blame.
Running the test script at the bottom of this email compares
/usr/bin/iconv on OpenSuse 12.2 against AST iconv. In theory the tests
should always print "x€x" but in the case of UTF16 it does not:
# encoding=EUC-JISX0213
a=0/b=0: x€x
a=0/b=1: x€x
a=1/b=0: x€x
a=1/b=1: x€x
# encoding=GB18030
a=0/b=0: x€x
a=0/b=1: x€x
a=1/b=0: x€x
a=1/b=1: x€x
# encoding=UCS4
a=0/b=0: x€x
a=0/b=1: x€x
a=1/b=0: x€x
a=1/b=1: x€x
# encoding=UTF16
a=0/b=0: x€x
a=0/b=1: ��x� x
a=1/b=0: 겂a=1/b=1: x€x
The last lines are garbled when AST and GNU iconv are mixed.
My test script looks like this:
typeset -a iconvpaths=(
'/usr/bin/iconv'
'/home/lcons/bin/iconv'
)
typeset -a test_encodings=(
'EUC-JISX0213'
'GB18030'
'UCS4'
'UTF16'
)
set -o nounset
for (( tenc=0 ; tenc < ${#test_encodings[@]} ; tenc++ )) ; do
printf '# encoding=%q\n' "${test_encodings[tenc]}"
for (( a=0 ; a < ${#iconvpaths[@]} ; a++ )) ; do
for (( b=0 ; b < ${#iconvpaths[@]} ; b++ )) ; do
printf 'a=%d/b=%d: ' a b
printf 'x\u[20ac]x\n' | \
${iconvpaths[a]} -f UTF8 -t
"${test_encodings[tenc]}" | \
${iconvpaths[b]} -f "${test_encodings[tenc]}"
-t UTF8
done
done
done
Lionel
_______________________________________________
ast-users mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-users