On Wed, Apr 10, 2013 at 1:57 PM, Roland Mainz <[email protected]> wrote:
> On Wed, Apr 10, 2013 at 1:32 PM, Roland Mainz <[email protected]> 
> wrote:
>> [CC:'ing Werner since this is i18n related and was only observed on
>> SuSE 12.3 Linux for now...]
>>
>> Attached (as "astksh20130409_suse123_32bit_builtin_iconv_hang1.txt.gz")
>> is a (compressed) text file which causes the AST "iconv" builtin
>> utility from ast-ksh.2013-04-09 to "hang" in an endless loop in 32bit
>> i386 builds (AMD64 64bit builds are OK... *ONLY* the 32bit builds loop
>> forever...).
>>
>> Example:
>> -- snip --
>> $ gunzip  astksh20130409_suse123_32bit_builtin_iconv_hang1.txt.gz
>> $ LC_ALL=en_US.UTF-8 ../build_i386_32bit_debug/arch/linux.i386/bin/ksh
>> -c 'builtin iconv ; iconv -f UTF-8
>> /tmp/astksh20130409_suse123_32bit_builtin_iconv_hang1.txt >/tmp/zzz2 ;
>> true'
>> <hangs forever>
>> -- snip --
>>
>> Neither 32bit or 64bit builds trigger any valgrind hits and the gdb
>> stacktrace is no very usefull either:
>> -- snip --
>> $ LC_ALL=en_US.UTF-8 gdb --args
>> ../build_i386_32bit_debug/arch/linux.i386/bin/ksh -c 'builtin iconv ;
>> iconv -f UTF-8 /tmp/astksh20130409_suse123_32bit_builtin_iconv_hang1.txt
>>>/tmp/zzz2 ; true'
>> GNU gdb (GDB) SUSE (7.5.1-2.1.1)
>> Copyright (C) 2012 Free Software Foundation, Inc.
>> [snip]
>> Reading symbols from
>> /home/test001/work/ast_ksh_20130409/build_i386_32bit_debug/arch/linux.i386/bin/ksh...done.
>> (gdb) run
>> Starting program:
>> /home/test001/work/ast_ksh_20130409/build_i386_32bit_debug/arch/linux.i386/bin/ksh
>> -c builtin\ iconv\ \;\ iconv\ -f\ UTF-8\
>> /tmp/astksh20130409_suse123_32bit_builtin_iconv_hang1.txt\
>> \>/tmp/zzz2\ \;\ true
>> Missing separate debuginfo for /lib/ld-linux.so.2
>> [snip]
>> ^C
>> Program received signal SIGINT, Interrupt.
>> 0xf7dd2447 in __gconv_transform_utf8_internal () from /lib/libc.so.6
>> (gdb) where
>> #0  0xf7dd2447 in __gconv_transform_utf8_internal () from /lib/libc.so.6
>> #1  0xf7dcd00a in __gconv () from /lib/libc.so.6
>> #2  0xf7dcc5b2 in iconv () from /lib/libc.so.6
>> #3  0x00000001 in ?? ()
>> #4  0x0821b400 in ?? ()
>> Backtrace stopped: previous frame inner to this frame (corrupt stack?)
>> -- snip --
>> (I don't know how to "fix" the "previous frame inner to this frame"
>> issue... ;-( )
>
> More data: if I force the ksh93 builtin "iconv" to read from a pipe I
> get a warning about an incomplete multibyte sequence...
> -- snip --
> $ LC_ALL=en_US.UTF-8 ../build_i386_32bit_debug/arch/linux.i386/bin/ksh
> -c 'builtin iconv ; cat
> /tmp/astksh20130409_suse123_32bit_builtin_iconv_hang1.txt | iconv -f
> UTF-8 >/tmp/zzz2 ; true'
> iconv: incomplete multibyte sequence at offset 32767 [Invalid argument]
> -- snip --
> ... it seems the issue is somehow related to the difference that
> "iconv" reading a plain file uses |mmap()| ... triggering a different
> codepath than reading from a pipe.
>
> Question is now... who is correct ? GNU "iconv" doesn't seem to print
> any warnings/errors for the input file while AST "iconv" prints a
> warning when reading from a pipe and hangs when reading via |mmap()|
> ...
> ... another issue is... why does this only happen for 32bit builds ?

The issue does happen for 64bit builds, too.

It seems it happens (for 32bit builds) when a multibyte character is
exactly at a 32k buffer boundary... one part of the multibyte
character is in the first buffer and the rest of the multibyte
character's bytes is in the 2nd buffer.

Here is a reduced/standalone testcase:
-- snip --
$ ksh -c 'builtin iconv ; integer i ; typeset prefix="123" ; for ((i=0
; i < 2**16 ; i++ )) ; do printf "%s\u[20ac]" "$prefix" ; done | iconv
-f UTF-8 >xxx'
-- snip --
(the string length of "prefix" may have to be varied to catch sfio
buffers of a different size (I'll write a testcase for the builtin
iconv later))

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) [email protected]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to