Hi! ----
ksh93r+_alpha_20060724 (however some of the previous alphas were affected too) sometimes hangs when running the "heredoc.sh" test script in a multibyte locale like "ja_JP.PCK" (other multibyte locales hit this problem, too, however I have the feeling that the *.UTF-8 ones are less commonly affected than the non-UTF-8 ones (like PCK, GB18030, EUC etc.)). I am not sure whether this is problem in the Solaris code (CC:'ing the i18n-discuss at opensolaris.org that Ienup can take a look at the problem) or in the ksh93 codebase. The symptoms are simply that the shell process hangs around, consuming 100% CPU time and does not respond to SIGTERM/SIGHUP, it seems only SIGKILL can end this problem. I have sampled three stack traces: -- snip -- $ pstack 18648 18648: /home/test001/ksh93/on_build1/test1_x86/usr/src/cmd/ksh/i386/ksh ../.. feee318e mblen (80ab038, 2) + 2d fef4ee03 copyto (8061160, 7d, 1) + d5 fef510a4 varsub (8061160) + fb4 fef4eb08 sh_machere (80ab130, 80ab038, 8066ff9) + 3b9 fef456f2 io_heredoc (8066fcc, 8066ff9) + d0 fef451d8 sh_redirect (8066fcc, 0) + 785 fef6a509 sh_exec (8066fac, 4) + 2540 fef649ab sh_subshell (8066fac, 4, 1) + 36a fef526bb comsubst (8061160, 1) + 56f fef5202c varsub (8061160) + 1f3c fef4f503 copyto (8061160, 0, 0) + 7d5 fef4e4c7 sh_mactrim (8066f5d, ffffffff) + e6 fef5424d nv_setlist (8066f54, 20200) + 581 fef6a16c sh_exec (8066f78, 4) + 21a3 fef34e0a exfile (fef9e9c8, 80a77c0, 3) + 6c0 fef346d3 sh_main (2, 8047cd4, 0, feffa7c0, 8047c90, 8047d08) + 6c1 0805074c main (2, 8047cd4, 8047ce0) + 1d 0805069a ???????? (2, 8047d70, 8047db1, 0, 8047ddf, 8047e22) $ pstack 18648 18648: /home/test001/ksh93/on_build1/test1_x86/usr/src/cmd/ksh/i386/ksh ../.. fef4edf6 copyto (8061160, 7d, 1) + c8 fef510a4 varsub (8061160) + fb4 fef4eb08 sh_machere (80ab130, 80ab038, 8066ff9) + 3b9 fef456f2 io_heredoc (8066fcc, 8066ff9) + d0 fef451d8 sh_redirect (8066fcc, 0) + 785 fef6a509 sh_exec (8066fac, 4) + 2540 fef649ab sh_subshell (8066fac, 4, 1) + 36a fef526bb comsubst (8061160, 1) + 56f fef5202c varsub (8061160) + 1f3c fef4f503 copyto (8061160, 0, 0) + 7d5 fef4e4c7 sh_mactrim (8066f5d, ffffffff) + e6 fef5424d nv_setlist (8066f54, 20200) + 581 fef6a16c sh_exec (8066f78, 4) + 21a3 fef34e0a exfile (fef9e9c8, 80a77c0, 3) + 6c0 fef346d3 sh_main (2, 8047cd4, 0, feffa7c0, 8047c90, 8047d08) + 6c1 0805074c main (2, 8047cd4, 8047ce0) + 1d 0805069a ???????? (2, 8047d70, 8047db1, 0, 8047ddf, 8047e22) $ pstack 18648 18648: /home/test001/ksh93/on_build1/test1_x86/usr/src/cmd/ksh/i386/ksh ../.. feb00c9f __mblen_pck (feb57a60, 80ab038, 2) + 5f feee318b mblen (80ab038, 2) + 2a fef4ee03 copyto (8061160, 7d, 1) + d5 fef510a4 varsub (8061160) + fb4 fef4eb08 sh_machere (80ab130, 80ab038, 8066ff9) + 3b9 fef456f2 io_heredoc (8066fcc, 8066ff9) + d0 fef451d8 sh_redirect (8066fcc, 0) + 785 fef6a509 sh_exec (8066fac, 4) + 2540 fef649ab sh_subshell (8066fac, 4, 1) + 36a fef526bb comsubst (8061160, 1) + 56f fef5202c varsub (8061160) + 1f3c fef4f503 copyto (8061160, 0, 0) + 7d5 fef4e4c7 sh_mactrim (8066f5d, ffffffff) + e6 fef5424d nv_setlist (8066f54, 20200) + 581 fef6a16c sh_exec (8066f78, 4) + 21a3 fef34e0a exfile (fef9e9c8, 80a77c0, 3) + 6c0 fef346d3 sh_main (2, 8047cd4, 0, feffa7c0, 8047c90, 8047d08) + 6c1 0805074c main (2, 8047cd4, 8047ce0) + 1d 0805069a ???????? (2, 8047d70, 8047db1, 0, 8047ddf, 8047e22) -- snip -- (|copyto()| is a function in libshell/common/sh/macro.c (see http://polaris.blastwave.org/browser/on/branches/ksh93/gisburn/prototype002/m1_ast_ast_imported/usr/src/lib/libshell/common/sh/macro.c) and there are only two commits to "macro.c" since the creation of" ksh93-integration prototype002", see http://polaris.blastwave.org/log/on/branches/ksh93/gisburn/prototype002/m1_ast_ast_imported/usr/src/lib/libshell/common/sh/macro.c ). I appears that ksh is stuck in an endless loop calling |libc::mblen()| and a simple % truss -u :: -o ksh_truss.log -p 8271 # confirms that: -- snip -- /1 at 1: -> libc:mblen(0x80ab038, 0x2) /1 at 1: -> methods_ja_JP.PCK:__mblen_pck(0xfeb57a60, 0x80ab038, 0x2) /1 at 1: <- methods_ja_JP.PCK:__mblen_pck() = 2 /1 at 1: <- libc:mblen() = 2 /1 at 1: -> libc:mblen(0x80ab038, 0x2) /1 at 1: -> methods_ja_JP.PCK:__mblen_pck(0xfeb57a60, 0x80ab038, 0x2) /1 at 1: <- methods_ja_JP.PCK:__mblen_pck() = 2 /1 at 1: <- libc:mblen() = 2 /1 at 1: -> libc:mblen(0x80ab038, 0x2) /1 at 1: -> methods_ja_JP.PCK:__mblen_pck(0xfeb57a60, 0x80ab038, 0x2) /1 at 1: <- methods_ja_JP.PCK:__mblen_pck() = 2 /1 at 1: <- libc:mblen() = 2 /1 at 1: -> libc:mblen(0x80ab038, 0x2) /1 at 1: -> methods_ja_JP.PCK:__mblen_pck(0xfeb57a60, 0x80ab038, 0x2) /1 at 1: <- methods_ja_JP.PCK:__mblen_pck() = 2 /1 at 1: <- libc:mblen() = 2 -- snip -- * Steps to reproduce: 1. Pull sources and extract closed bin stuff: $ svn checkout -r 406 svn://svn.genunix.org/on/branches/ksh93/gisburn/prototype002/m1_ast_ast_imported/usr $ bzcat ../download/on-closed-bins-b37.i386.tar.bz2 | tar -xf - 2. Run "bldenv": $ cd .. ; env - SHELL=$SHELL TERM=$TERM HOME=$HOME LOGNAME=$LOGNAME DISPLAY=$DISPLAY LANG=C LC_ALL=C PAGER=less MANPATH=$MANPATH /opt/onbld/bin/bldenv opensolaris.sh 3. Build it (the quick way): $ cd test_x86/usr/src $ time nice make setup 2>&1 | tee -a buildlog_setup.log $ time nice dmake install >buildlog.log 2>&1 In some cases the "dmake install" hangs in usr/src/cmd/ksh/i386/Makefile running the "testshell" target (like on the Solaris Nevada_B46/AMD64 I am using right now) and other machines (like my Intel P4M laptop or my Ultra5 with B37) just execute it without any problems. 4. Run the tests % cd ${SRC}/cmd/ksh/i386 (or AMD64, SPARC, SPARCv9) % make testshell Does anyone have ideas/suggestions/etc. how to track this problem down ? ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) roland.mainz at nrubsig.org \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 7950090 (;O/ \/ \O;)