Hi!

----

ksh93r+_alpha_20060724 (however some of the previous alphas were
affected too) sometimes hangs when running the "heredoc.sh" test script
in a multibyte locale like "ja_JP.PCK" (other multibyte locales hit this
problem, too, however I have the feeling that the *.UTF-8 ones are less
commonly affected than the non-UTF-8 ones (like PCK, GB18030, EUC
etc.)). I am not sure whether this is problem in the Solaris code
(CC:'ing the i18n-discuss at opensolaris.org that Ienup can take a look at
the problem) or in the ksh93 codebase.

The symptoms are simply that the shell process hangs around, consuming
100% CPU time and does not respond to SIGTERM/SIGHUP, it seems only
SIGKILL can end this problem.

I have sampled three stack traces:
-- snip --
$ pstack 18648
18648:  /home/test001/ksh93/on_build1/test1_x86/usr/src/cmd/ksh/i386/ksh
../..
 feee318e mblen    (80ab038, 2) + 2d
 fef4ee03 copyto   (8061160, 7d, 1) + d5
 fef510a4 varsub   (8061160) + fb4
 fef4eb08 sh_machere (80ab130, 80ab038, 8066ff9) + 3b9
 fef456f2 io_heredoc (8066fcc, 8066ff9) + d0
 fef451d8 sh_redirect (8066fcc, 0) + 785
 fef6a509 sh_exec  (8066fac, 4) + 2540
 fef649ab sh_subshell (8066fac, 4, 1) + 36a
 fef526bb comsubst (8061160, 1) + 56f
 fef5202c varsub   (8061160) + 1f3c
 fef4f503 copyto   (8061160, 0, 0) + 7d5
 fef4e4c7 sh_mactrim (8066f5d, ffffffff) + e6
 fef5424d nv_setlist (8066f54, 20200) + 581
 fef6a16c sh_exec  (8066f78, 4) + 21a3
 fef34e0a exfile   (fef9e9c8, 80a77c0, 3) + 6c0
 fef346d3 sh_main  (2, 8047cd4, 0, feffa7c0, 8047c90, 8047d08) + 6c1
 0805074c main     (2, 8047cd4, 8047ce0) + 1d
 0805069a ???????? (2, 8047d70, 8047db1, 0, 8047ddf, 8047e22)
$ pstack 18648
18648:  /home/test001/ksh93/on_build1/test1_x86/usr/src/cmd/ksh/i386/ksh
../..
 fef4edf6 copyto   (8061160, 7d, 1) + c8
 fef510a4 varsub   (8061160) + fb4
 fef4eb08 sh_machere (80ab130, 80ab038, 8066ff9) + 3b9
 fef456f2 io_heredoc (8066fcc, 8066ff9) + d0
 fef451d8 sh_redirect (8066fcc, 0) + 785
 fef6a509 sh_exec  (8066fac, 4) + 2540
 fef649ab sh_subshell (8066fac, 4, 1) + 36a
 fef526bb comsubst (8061160, 1) + 56f
 fef5202c varsub   (8061160) + 1f3c
 fef4f503 copyto   (8061160, 0, 0) + 7d5
 fef4e4c7 sh_mactrim (8066f5d, ffffffff) + e6
 fef5424d nv_setlist (8066f54, 20200) + 581
 fef6a16c sh_exec  (8066f78, 4) + 21a3
 fef34e0a exfile   (fef9e9c8, 80a77c0, 3) + 6c0
 fef346d3 sh_main  (2, 8047cd4, 0, feffa7c0, 8047c90, 8047d08) + 6c1
 0805074c main     (2, 8047cd4, 8047ce0) + 1d
 0805069a ???????? (2, 8047d70, 8047db1, 0, 8047ddf, 8047e22)
$ pstack 18648
18648:  /home/test001/ksh93/on_build1/test1_x86/usr/src/cmd/ksh/i386/ksh
../..
 feb00c9f __mblen_pck (feb57a60, 80ab038, 2) + 5f
 feee318b mblen    (80ab038, 2) + 2a
 fef4ee03 copyto   (8061160, 7d, 1) + d5
 fef510a4 varsub   (8061160) + fb4
 fef4eb08 sh_machere (80ab130, 80ab038, 8066ff9) + 3b9
 fef456f2 io_heredoc (8066fcc, 8066ff9) + d0
 fef451d8 sh_redirect (8066fcc, 0) + 785
 fef6a509 sh_exec  (8066fac, 4) + 2540
 fef649ab sh_subshell (8066fac, 4, 1) + 36a
 fef526bb comsubst (8061160, 1) + 56f
 fef5202c varsub   (8061160) + 1f3c
 fef4f503 copyto   (8061160, 0, 0) + 7d5
 fef4e4c7 sh_mactrim (8066f5d, ffffffff) + e6
 fef5424d nv_setlist (8066f54, 20200) + 581
 fef6a16c sh_exec  (8066f78, 4) + 21a3
 fef34e0a exfile   (fef9e9c8, 80a77c0, 3) + 6c0
 fef346d3 sh_main  (2, 8047cd4, 0, feffa7c0, 8047c90, 8047d08) + 6c1
 0805074c main     (2, 8047cd4, 8047ce0) + 1d
 0805069a ???????? (2, 8047d70, 8047db1, 0, 8047ddf, 8047e22)
-- snip --
(|copyto()| is a function in libshell/common/sh/macro.c (see
http://polaris.blastwave.org/browser/on/branches/ksh93/gisburn/prototype002/m1_ast_ast_imported/usr/src/lib/libshell/common/sh/macro.c)
and there are only two commits to "macro.c" since the creation of"
ksh93-integration prototype002", see
http://polaris.blastwave.org/log/on/branches/ksh93/gisburn/prototype002/m1_ast_ast_imported/usr/src/lib/libshell/common/sh/macro.c
).

I appears that ksh is stuck in an endless loop calling |libc::mblen()|
and a simple % truss -u :: -o ksh_truss.log -p 8271 # confirms that:
-- snip --
/1 at 1:   -> libc:mblen(0x80ab038, 0x2)
/1 at 1:     -> methods_ja_JP.PCK:__mblen_pck(0xfeb57a60, 0x80ab038, 0x2)
/1 at 1:     <- methods_ja_JP.PCK:__mblen_pck() = 2
/1 at 1:   <- libc:mblen() = 2
/1 at 1:   -> libc:mblen(0x80ab038, 0x2)
/1 at 1:     -> methods_ja_JP.PCK:__mblen_pck(0xfeb57a60, 0x80ab038, 0x2)
/1 at 1:     <- methods_ja_JP.PCK:__mblen_pck() = 2
/1 at 1:   <- libc:mblen() = 2
/1 at 1:   -> libc:mblen(0x80ab038, 0x2)
/1 at 1:     -> methods_ja_JP.PCK:__mblen_pck(0xfeb57a60, 0x80ab038, 0x2)
/1 at 1:     <- methods_ja_JP.PCK:__mblen_pck() = 2
/1 at 1:   <- libc:mblen() = 2
/1 at 1:   -> libc:mblen(0x80ab038, 0x2)
/1 at 1:     -> methods_ja_JP.PCK:__mblen_pck(0xfeb57a60, 0x80ab038, 0x2)
/1 at 1:     <- methods_ja_JP.PCK:__mblen_pck() = 2
/1 at 1:   <- libc:mblen() = 2
-- snip --

* Steps to reproduce:
1. Pull sources and extract closed bin stuff:
$ svn checkout -r 406
svn://svn.genunix.org/on/branches/ksh93/gisburn/prototype002/m1_ast_ast_imported/usr
$ bzcat ../download/on-closed-bins-b37.i386.tar.bz2 | tar -xf -

2. Run "bldenv":
$ cd .. ; env - SHELL=$SHELL TERM=$TERM HOME=$HOME LOGNAME=$LOGNAME
DISPLAY=$DISPLAY LANG=C LC_ALL=C PAGER=less MANPATH=$MANPATH
/opt/onbld/bin/bldenv opensolaris.sh

3. Build it (the quick way):
$ cd test_x86/usr/src
$ time nice make setup 2>&1 | tee -a buildlog_setup.log
$ time nice dmake install >buildlog.log 2>&1

In some cases the "dmake install" hangs in usr/src/cmd/ksh/i386/Makefile
running the "testshell" target (like on the Solaris Nevada_B46/AMD64 I
am using right now) and other machines (like my Intel P4M laptop or my
Ultra5 with B37) just execute it without any problems. 

4. Run the tests
% cd ${SRC}/cmd/ksh/i386 (or AMD64, SPARC, SPARCv9)
% make testshell

Does anyone have ideas/suggestions/etc. how to track this problem down ?

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) roland.mainz at nrubsig.org
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 7950090
 (;O/ \/ \O;)

Reply via email to