[ast-users] AST or GNU iconv bug?

Lionel Cons Tue, 25 Sep 2012 22:26:51 -0700

Glenn, I think I found a bug in either GNU (iconv (GNU libc) 2.15) or
AST iconv (iconv (AT&T Research) 2011-01-11).
I'm still not sure who's to blame.


Running the test script at the bottom of this email compares
/usr/bin/iconv on OpenSuse 12.2 against AST iconv. In theory the tests
should always print "x€x" but in the case of UTF16 it does not:
# encoding=EUC-JISX0213
a=0/b=0: x€x
a=0/b=1: x€x
a=1/b=0: x€x
a=1/b=1: x€x
# encoding=GB18030
a=0/b=0: x€x
a=0/b=1: x€x
a=1/b=0: x€x
a=1/b=1: x€x
# encoding=UCS4
a=0/b=0: x€x
a=0/b=1: x€x
a=1/b=0: x€x
a=1/b=1: x€x
# encoding=UTF16
a=0/b=0: x€x
a=0/b=1: ��x� x
a=1/b=0: 겂੸a=1/b=1: x€x

The last lines are garbled when AST and GNU iconv are mixed.

My test script looks like this:

typeset -a iconvpaths=(
        '/usr/bin/iconv'
        '/home/lcons/bin/iconv'
)

typeset -a test_encodings=(
        'EUC-JISX0213'
        'GB18030'
        'UCS4'
        'UTF16'
)

set -o nounset

for (( tenc=0 ; tenc < ${#test_encodings[@]} ; tenc++ )) ; do
        printf '# encoding=%q\n' "${test_encodings[tenc]}"
        for (( a=0 ; a < ${#iconvpaths[@]} ; a++ )) ; do
                for (( b=0 ; b < ${#iconvpaths[@]} ; b++ )) ; do
                        printf 'a=%d/b=%d: ' a b

                        printf 'x\u[20ac]x\n' | \
                                ${iconvpaths[a]} -f UTF8 -t 
"${test_encodings[tenc]}" | \
                                ${iconvpaths[b]} -f "${test_encodings[tenc]}" 
-t UTF8
                done
        done
done

Lionel
_______________________________________________
ast-users mailing list
[email protected]
https://mailman.research.att.com/mailman/listinfo/ast-users

[ast-users] AST or GNU iconv bug?

Reply via email to