Hi! ----
We're hitting a sporadic (and hard-to-reproduce) "hang" in "basics.sh" on Solaris 11/x86/32bit/debug (build with Sun Studio 12.1). The test output looks like this: -- snip -- + /home/test001/ksh93/ast_ksh_20130409/build_i386_32bit_debug/arch/sol11.i386/src/cmd/ksh93/ksh ./src/cmd/ksh93/tests/shtests --locale LD_LIBRARY_PATH_64=/home/test001/ksh93/ast_ksh_20130409/build_i386_32bit_debug/arch/sol11.i386/lib: LD_LIBRARY_PATH=/home/test001/ksh93/ast_ksh_20130409/build_i386_32bit_debug/arch/sol11.i386/lib: LD_LIBRARY_PATH_32=/home/test001/ksh93/ast_ksh_20130409/build_i386_32bit_debug/arch/sol11.i386/lib: LC_ALL=en_US.UTF-8 LANG=en_US.UTF-8 VMALLOC_OPTIONS=abort SHCOMP='/home/test001/ksh93/ast_ksh_20130409/build_i386_32bit_debug/arch/*/bin/shcomp' ./src/cmd/ksh93/tests/basic.sh test basic(en_US.UTF-8) begins at 2013-04-12+01:48:38 test basic(en_US.UTF-8) passed at 2013-04-12+01:49:27 [ 106 tests 0 errors ] test basic(shcomp) begins at 2013-04-12+01:49:27 -- snip -- ... sending a SIGHUP to the hanging process results in... -- snip -- tee: write error [Broken pipe] /tmp/test001/tmp1VF5OWk.xYA/shcomp-basic.ksh: line 350: 3467: Hangup test basic(shcomp) failed at 2013-04-12+16:14:54 with exit code 269 [ 106 tests 269 errors ] -- snip -- Stack trace of the hang looks like this: -- snip -- (dbx) where [1] __read(0x0, 0xfed60378, 0x8000, 0xfee6acc9), at 0xfee806a5 [2] read(0x0, 0xfed60378, 0x8000), at 0xfee6ad3d [3] sfrd(f = 0x82669d0, buf = 0xfed60378, n = 32768U, disc = 0xfed4fcc8), line 273 in "sfrd.c" [4] piperead(iop = 0x82669d0, buff = 0xfed60378, size = 32768U, handle = 0xfed4fcc8), line 2355 in "io.c" [5] sfrd(f = 0x82669d0, buf = 0xfed60378, n = 32768U, disc = 0xfed4fcc8), line 253 in "sfrd.c" [6] sfmove(fr = 0x82669d0, fw = 0x8266a38, n = -1LL, rc = -1), line 169 in "sfmove.c" [7] b_cat(argc = 1, argv = 0xfed6b138, context = 0x82646a4), line 536 in "cat.c" [8] sh_exec(shp = 0x82643a8, t = 0xfed6b0c8, flags = 133), line 1343 in "xec.c" [9] sh_exec(shp = 0x82643a8, t = 0xfed6b0c8, flags = 133), line 2204 in "xec.c" [10] sh_exec(shp = 0x82643a8, t = 0xfed6b054, flags = 4), line 1865 in "xec.c" [11] sh_argprocsub(shp = 0x82643a8, argp = 0xfed6b048), line 832 in "args.c" [12] arg_expand(shp = 0x82643a8, argp = 0xfed6b048, argchain = 0x80470cc, flag = 512), line 863 in "args.c" =>[13] sh_argbuild(shp = 0x82643a8, nargs = 0x80471b8, comptr = 0xfed6afb8, flag = 512), line 727 in "args.c" [14] sh_exec(shp = 0x82643a8, t = 0xfed6afb8, flags = 516), line 975 in "xec.c" [15] sh_exec(shp = 0x82643a8, t = 0xfed6afac, flags = 516), line 2200 in "xec.c" [16] sh_exec(shp = 0x82643a8, t = 0xfed6af04, flags = 4), line 2348 in "xec.c" [17] sh_exec(shp = 0x82643a8, t = 0xfed6af04, flags = 4), line 2204 in "xec.c" [18] sh_exec(shp = 0x82643a8, t = 0xfed6a800, flags = 4), line 2524 in "xec.c" [19] exfile(shp = 0x82643a8, iop = 0xfed5f990, fno = 11), line 588 in "main.c" [20] sh_main(ac = 2, av = 0x8047870, userinit = (nil)), line 360 in "main.c" [21] main(argc = 2, argv = 0x8047870), line 45 in "pmain.c" -- snip -- The matching code in basics.sh looks like this: -- snip -- 347 builtin tee 2> /dev/null 348 for tee in "$(whence tee)" "$(whence -p tee)" 349 do print xxx > $tmp/file 350 $tee >(sleep 1;cat > $tmp/file) <<< "hello" > /dev/null 351 [[ $(< $tmp/file) != hello ]] && err_exit "process substitution does not wait for >() to complete with $tee" 352 print yyy > $tmp/file2 353 $tee >(cat > $tmp/file) >(sleep 1;cat > $tmp/file2) <<< "hello" > /dev/null 354 [[ $(< $tmp/file2) != hello ]] && err_exit "process substitution does not wait for second of two >() to complete with $tee" 355 print xxx > $tmp/file 356 $tee >(sleep 1;cat > $tmp/file) >(cat > $tmp/file2) <<< "hello" > /dev/null 357 [[ $(< $tmp/file) != hello ]] && err_exit "process substitution does not wait for first of two >() to complete with $tee" 358 done -- snip -- The first guess I have is that this _may_ be an issue with "sleep 1" and an overloaded system... which means the "sleep 1" delay was not enough... but looking at the script code... it doesn't explain the hang. Either something went wrong on the sending side (the "tee") or the receiving side (the "cat" reading from the pipe) didn't get the HUP condition... no clue. Erm... David/Glenn: Have you seen something similar recently ? if "not" we just leave this issue open until we hit it again (note this was a 32bit Solaris 11 build... and 32bit was less tested in the last 3-6 months than I wished for... ;-( (which means the issue might be older than just the last 1-2 ast-ksh alpha releases)). ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) [email protected] \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 3992797 (;O/ \/ \O;) _______________________________________________ ast-developers mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-developers
