Hi!

----

Below is a testcase which shows that the data returned by .sh.sig.pid
is currently (=ast-ksh.2013-04-09) not reliable when child processes
quit in a fast succession.

The testcase looks like this...
-- snip --

set -o nounset

integer i j
compound -a donejobs

function sighandler_chld
{
        donejobs+=(
                pid=${.sh.sig.pid}
        )
}

trap 'sighandler_chld' CHLD


for (( i=0 ; i < 512 ; i++ )) ; do
        {
                sleep 0.1
                exit 0
        } &
done

while ! wait ; do true ; done

sleep 0.1

#print -v donejobs
printf "# donejobs %d entries, expected %d\n" ${#donejobs[@]} i

# search for duplicate "pid" entries in the "donejobs" array
# (this should not happen unless the kernel has reused pids
# (which should not happen on modern Unix systems))
# we do this to demonstrate that there is an issue in
# ksh93's between CHLD trap processing when too many child
# processes are quitting simlutaneously
for (( i=0 ; i < ${#donejobs[@]} ; i++ )) ; do
        for (( j=0 ; j < ${#donejobs[@]} ; j++ )) ; do
                if (( (donejobs[$i].pid == donejobs[$j].pid)
                        && (i != j) )) ; then
                        printf 'duplicate pid=%d at %d/%d\n' \
                                donejobs[$i].pid i j
                fi
        done
done

exit 0
-- snip --

AFAIK the expected output would look like this...
-- snip --
# donejobs 512 entries, expected 512
-- snip --
... but ast-ksh.2013-04-09 reports many duplicate pids:
-- snip --
~/bin/ksh ../rtmin.sh
# donejobs 512 entries, expected 512
duplicate pid=5679 at 1/2
duplicate pid=5679 at 2/1
duplicate pid=5682 at 3/4
duplicate pid=5682 at 4/3
duplicate pid=5685 at 6/7
duplicate pid=5685 at 7/6
duplicate pid=5688 at 8/9
duplicate pid=5688 at 9/8
duplicate pid=5692 at 11/12
duplicate pid=5692 at 11/13
duplicate pid=5692 at 12/11
duplicate pid=5692 at 12/13
duplicate pid=5692 at 13/11
duplicate pid=5692 at 13/12
duplicate pid=5702 at 16/17
duplicate pid=5702 at 16/18
duplicate pid=5702 at 16/19
duplicate pid=5702 at 16/20
duplicate pid=5702 at 16/21
duplicate pid=5702 at 16/22
duplicate pid=5702 at 16/23
duplicate pid=5702 at 16/24
duplicate pid=5702 at 16/25
duplicate pid=5702 at 17/16
duplicate pid=5702 at 17/18
duplicate pid=5702 at 17/19
duplicate pid=5702 at 17/20
duplicate pid=5702 at 17/21
duplicate pid=5702 at 17/22
duplicate pid=5702 at 17/23
duplicate pid=5702 at 17/24
[snip]
-- snip --

AFAIK the issue is that the CHLD traps may be executed in a "nested"
manner instead of executed serially... the nested manner causes data
in .sh.sig to be overwritten via |sh_setsiginfo()| while the previous
trap is still being executed.

Question (for David):
Is there any reason we can't save the siginfo data (for all signals...
not just the data for CHLD) and execute the traps not in the signal
handler but before the next shell command is executed (e.g. inline
with the normal shell code flow) ? If that's allowed then I can craft
a patch for that (based on some testing it seems to fix all related
issues with .sh.sig and makes signals in ksh93 fully reliable).

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) [email protected]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to