Hi!

----

We're hitting a sporadic (and hard-to-reproduce) "hang" in "basics.sh"
on Solaris 11/x86/32bit/debug (build with Sun Studio 12.1).

The test output looks like this:
-- snip --
+ 
/home/test001/ksh93/ast_ksh_20130409/build_i386_32bit_debug/arch/sol11.i386/src/cmd/ksh93/ksh
./src/cmd/ksh93/tests/shtests --locale
LD_LIBRARY_PATH_64=/home/test001/ksh93/ast_ksh_20130409/build_i386_32bit_debug/arch/sol11.i386/lib:
LD_LIBRARY_PATH=/home/test001/ksh93/ast_ksh_20130409/build_i386_32bit_debug/arch/sol11.i386/lib:
LD_LIBRARY_PATH_32=/home/test001/ksh93/ast_ksh_20130409/build_i386_32bit_debug/arch/sol11.i386/lib:
LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8 VMALLOC_OPTIONS=abort
SHCOMP='/home/test001/ksh93/ast_ksh_20130409/build_i386_32bit_debug/arch/*/bin/shcomp'
./src/cmd/ksh93/tests/basic.sh
test basic(en_US.UTF-8) begins at 2013-04-12+01:48:38
test basic(en_US.UTF-8) passed at 2013-04-12+01:49:27 [ 106 tests 0 errors ]
test basic(shcomp) begins at 2013-04-12+01:49:27
-- snip --
... sending a SIGHUP to the hanging process results in...
-- snip --
tee: write error [Broken pipe]
/tmp/test001/tmp1VF5OWk.xYA/shcomp-basic.ksh: line 350: 3467: Hangup
test basic(shcomp) failed at 2013-04-12+16:14:54 with exit code 269 [
106 tests 269 errors ]
-- snip --

Stack trace of the hang looks like this:
-- snip --
(dbx) where
  [1] __read(0x0, 0xfed60378, 0x8000, 0xfee6acc9), at 0xfee806a5
  [2] read(0x0, 0xfed60378, 0x8000), at 0xfee6ad3d
  [3] sfrd(f = 0x82669d0, buf = 0xfed60378, n = 32768U, disc =
0xfed4fcc8), line 273 in "sfrd.c"
  [4] piperead(iop = 0x82669d0, buff = 0xfed60378, size = 32768U,
handle = 0xfed4fcc8), line 2355 in "io.c"
  [5] sfrd(f = 0x82669d0, buf = 0xfed60378, n = 32768U, disc =
0xfed4fcc8), line 253 in "sfrd.c"
  [6] sfmove(fr = 0x82669d0, fw = 0x8266a38, n = -1LL, rc = -1), line
169 in "sfmove.c"
  [7] b_cat(argc = 1, argv = 0xfed6b138, context = 0x82646a4), line
536 in "cat.c"
  [8] sh_exec(shp = 0x82643a8, t = 0xfed6b0c8, flags = 133), line 1343
in "xec.c"
  [9] sh_exec(shp = 0x82643a8, t = 0xfed6b0c8, flags = 133), line 2204
in "xec.c"
  [10] sh_exec(shp = 0x82643a8, t = 0xfed6b054, flags = 4), line 1865 in "xec.c"
  [11] sh_argprocsub(shp = 0x82643a8, argp = 0xfed6b048), line 832 in "args.c"
  [12] arg_expand(shp = 0x82643a8, argp = 0xfed6b048, argchain =
0x80470cc, flag = 512), line 863 in "args.c"
=>[13] sh_argbuild(shp = 0x82643a8, nargs = 0x80471b8, comptr =
0xfed6afb8, flag = 512), line 727 in "args.c"
  [14] sh_exec(shp = 0x82643a8, t = 0xfed6afb8, flags = 516), line 975
in "xec.c"
  [15] sh_exec(shp = 0x82643a8, t = 0xfed6afac, flags = 516), line
2200 in "xec.c"
  [16] sh_exec(shp = 0x82643a8, t = 0xfed6af04, flags = 4), line 2348 in "xec.c"
  [17] sh_exec(shp = 0x82643a8, t = 0xfed6af04, flags = 4), line 2204 in "xec.c"
  [18] sh_exec(shp = 0x82643a8, t = 0xfed6a800, flags = 4), line 2524 in "xec.c"
  [19] exfile(shp = 0x82643a8, iop = 0xfed5f990, fno = 11), line 588 in "main.c"
  [20] sh_main(ac = 2, av = 0x8047870, userinit = (nil)), line 360 in "main.c"
  [21] main(argc = 2, argv = 0x8047870), line 45 in "pmain.c"
-- snip --

The matching code in basics.sh looks like this:
-- snip --
   347          builtin tee 2> /dev/null
   348          for tee in "$(whence tee)" "$(whence -p tee)"
   349          do      print xxx > $tmp/file
   350                  $tee  >(sleep 1;cat > $tmp/file) <<< "hello" > /dev/null
   351                  [[ $(< $tmp/file) != hello ]] && err_exit
"process substitution does not wait for >() to complete with $tee"
   352                  print yyy > $tmp/file2
   353                  $tee >(cat > $tmp/file) >(sleep 1;cat >
$tmp/file2) <<< "hello" > /dev/null
   354                  [[ $(< $tmp/file2) != hello ]] && err_exit
"process substitution does not wait for second of two >() to complete
with $tee"
   355                  print xxx > $tmp/file
   356                  $tee  >(sleep 1;cat > $tmp/file) >(cat >
$tmp/file2) <<< "hello" > /dev/null
   357                  [[ $(< $tmp/file) != hello ]] && err_exit
"process substitution does not wait for first of two >() to complete
with $tee"
   358          done
-- snip --

The first guess I have is that this _may_ be an issue with "sleep 1"
and an overloaded system... which means the "sleep 1" delay was not
enough... but looking at the script code... it doesn't explain the
hang. Either something went wrong on the sending side (the "tee") or
the receiving side (the "cat" reading from the pipe) didn't get the
HUP condition... no clue.

Erm... David/Glenn: Have you seen something similar recently ? if
"not" we just leave this issue open until we hit it again (note this
was a 32bit Solaris 11 build... and 32bit was less tested in the last
3-6 months than I wished for... ;-( (which means the issue might be
older than just the last 1-2 ast-ksh alpha releases)).

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) [email protected]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to