Are there any known issues where a SIGSTOP can trigger multiple
SIGCHLD trap calls with code=STOPPED for the same event?

We've experiencing trouble with this kind of problem, i.e. lack of
SIGCHLD state change reports when a child changes from stopped to
running or from running to stop, on a massive scale if the number of
children exceeds a few hundred processor if the parent process is
stalled by paging/swapping.

I can't reproduce it with a simple testcase but while searching I once
had this failure:
ksh -x -c 'builtin pids ; integer numsigchld=0 ; trap "print -v
.sh.sig;((numsigchld++))" CHLD ; { while true ; do kill -s STOP $(pids
-f "%(pid)d") ; done } & pid=$! ; sleep 1 ; kill -CONT $pid ;
/usr/bin/sleep 1; kill -KILL $pid ; wait $pid ; print
"$?,${numsigchld}"'
+ builtin pids
+ numsigchld=0
+ typeset -li numsigchld
+ trap 'print -v .sh.sig;((numsigchld++))' CHLD
+ pid=26972
+ sleep 1
+ true
+ pids -f '%(pid)d'
+ kill -s STOP 26972
+ print -v .sh.sig
(
        typeset -r -l -i 16 addr=16#3e80000695c
        typeset -r -l -i band=0
        typeset -r code=STOPPED
        typeset -r -i errno=0
        typeset -r name=CHLD
        typeset -r -i pid=26972
        typeset -r -i signo=17
        typeset -r -i status=19
        typeset -r -i uid=231713
        value=(
                typeset -r -i int=19
                typeset -r -l -i 16 ptr=16#13
        )
)
+ ((numsigchld++))
+ kill -CONT 26972
+ true
+ pids -f '%(pid)d'
+ kill -s STOP 26972
+ print -v .sh.sig
(
        typeset -r -l -i 16 addr=16#3e80000695c
        typeset -r -l -i band=0
        typeset -r code=CONTINUED
        typeset -r -i errno=0
        typeset -r name=CHLD
        typeset -r -i pid=26972
        typeset -r -i signo=17
        typeset -r -i status=0
        typeset -r -i uid=231713
        value=(
                typeset -r -i int=0
                typeset -r -l -i 16 ptr=16#0
        )
)
+ ((numsigchld++))
+ /usr/bin/sleep 1
./arch/linux.i386-64/bin/ksh: 26972: Stopped (SIGSTOP)
+ print -v .sh.sig
(
        typeset -r -l -i 16 addr=16#3e80000695c
        typeset -r -l -i band=0
        typeset -r code=STOPPED
        typeset -r -i errno=0
        typeset -r name=CHLD
        typeset -r -i pid=26972
        typeset -r -i signo=17
        typeset -r -i status=19
        typeset -r -i uid=231713
        value=(
                typeset -r -i int=19
                typeset -r -l -i 16 ptr=16#13
        )
)
+ ((numsigchld++))
+ print -v .sh.sig
(
        typeset -r -l -i 16 addr=16#3e80000695c
        typeset -r -l -i band=0
        typeset -r code=STOPPED
        typeset -r -i errno=0
        typeset -r name=CHLD
        typeset -r -i pid=26972
        typeset -r -i signo=17
        typeset -r -i status=19
        typeset -r -i uid=231713
        value=(
                typeset -r -i int=19
                typeset -r -l -i 16 ptr=16#13
        )
)
+ ((numsigchld++))
+ kill -KILL 26972
+ print -v .sh.sig
(
        typeset -r -l -i 16 addr=16#3e80000695c
        typeset -r -l -i band=0
        typeset -r code=KILLED
        typeset -r -i errno=0
        typeset -r name=CHLD
        typeset -r -i pid=26972
        typeset -r -i signo=17
        typeset -r -i status=9
        typeset -r -i uid=231713
        value=(
                typeset -r -i int=9
                typeset -r -l -i 16 ptr=16#9
        )
)
+ ((numsigchld++))
+ wait 26972
+ print 265,5
265,5

SIGCHLD trap was called twice for the STOP signal and the total count
of signals is 5 (numsigchld=5) instead of 4.

Ced
-- 
Cedric Blancher <[email protected]>
Institute Pasteur
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to