On Wed, Mar 27, 2013 at 11:11 AM, Irek Szczesniak <[email protected]> wrote: > On Wed, Mar 27, 2013 at 12:34 AM, Roland Mainz <[email protected]> > wrote: >> On Tue, Mar 26, 2013 at 11:14 PM, Roland Mainz <[email protected]> >> wrote: >>> While playing around with realtime signals I hacked-together a >>> "simple" testcase which shows that ksh93 (ast-ksh.2013-03-18): >>> -- snip -- >>> builtin wc >>> >>> # config >>> integer -r num_attackers=50 >>> >>> compound -a ar >>> >>> trap 'ar+=( integer value=${.sh.sig.value} pid=${.sh.sig.pid} )' RTMIN >>> >>> integer thispid=$$ >>> integer i >>> >>> for (( i=0 ; i < num_attackers ; i++ )) ; do >>> kill -q $i -RTMIN $thispid & >>> done >>> >>> # wait for all child processes >>> while ! wait ; do >>> true >>> done >>> >>> # list jobs (this list should be empty after the >>> # "wait"-loop above) >>> jobdata=${ jobs 2>&1 ; } >>> printf '%s\n' "$jobdata" >>> >>> print -v ar >>> printf '# number of array elements in ar=%d, expected %d.\n' \ >>> ${#ar[@]} num_attackers >>> printf '# job data, expected 0 lines, got %d.\n' \ >>> $(wc -l <<<"$jobdata") >>> >>> print '# done.' >>> -- snip -- >>> >>> Running this gives some weired (and variable output): >>> -- snip -- >>> $ ~/bin/ksh sigrtstorm1.sh >>> [50] + Running <command unknown> >>> [49] - Running <command unknown> >>> [48] Running <command unknown> >>> [47] Lowest priority realtime signal <command unknown> >>> [46] Running <command unknown> >>> [45] Running <command unknown> >>> [44] Running <command unknown> >>> [43] Running <command unknown> >>> [42] Running <command unknown> >>> [41] Lowest priority realtime signal <command unknown> >>> [40] Running <command unknown> >>> [39] Running <command unknown> >>> [38] Running <command unknown> >>> [37] Running <command unknown> >>> [36] Running <command unknown> >>> [35] Lowest priority realtime signal <command unknown> >>> [34] Running <command unknown> >>> [33] Running <command unknown> >>> [32] Lowest priority realtime signal <command unknown> >>> [31] Running <command unknown> >>> [30] Running <command unknown> >>> [29] Lowest priority realtime signal <command unknown> >>> [28] Running <command unknown> >>> [27] Running <command unknown> >>> [26] Lowest priority realtime signal <command unknown> >>> [25] Running <command unknown> >>> [24] Lowest priority realtime signal <command unknown> >>> [23] Running <command unknown> >>> [22] Running <command unknown> >>> [21] Running <command unknown> >>> [20] Running <command unknown> >>> [19] Lowest priority realtime signal <command unknown> >>> [18] Running <command unknown> >>> [17] Running <command unknown> >>> [16] Lowest priority realtime signal <command unknown> >>> [15] Running <command unknown> >>> [14] Running <command unknown> >>> [13] Running <command unknown> >>> [12] Running <command unknown> >>> [11] Lowest priority realtime signal <command unknown> >>> [10] Running <command unknown> >>> [9] Lowest priority realtime signal <command unknown> >>> [8] Running <command unknown> >>> [7] Running <command unknown> >>> [6] Running <command unknown> >>> [5] Running <command unknown> >>> [4] Running <command unknown> >>> [3] Lowest priority realtime signal <command unknown> >>> [2] Running <command unknown> >>> [1] Running <command unknown> >>> ( >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3261 >>> typeset -l -i value=4 >>> ) >>> ( >>> typeset -l -i pid=3261 >>> typeset -l -i value=4 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3284 >>> typeset -l -i value=19 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3316 >>> typeset -l -i value=39 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3254 >>> typeset -l -i value=0 >>> ) >>> ( >>> typeset -l -i pid=3323 >>> typeset -l -i value=44 >>> ) >>> ( >>> typeset -l -i pid=3319 >>> typeset -l -i value=41 >>> ) >>> ( >>> typeset -l -i pid=3319 >>> typeset -l -i value=41 >>> ) >>> ( >>> typeset -l -i pid=3322 >>> typeset -l -i value=43 >>> ) >>> ( >>> typeset -l -i pid=3329 >>> typeset -l -i value=49 >>> ) >>> ( >>> typeset -l -i pid=3312 >>> typeset -l -i value=36 >>> ) >>> ( >>> typeset -l -i pid=3327 >>> typeset -l -i value=47 >>> ) >>> ) >>> # number of array elements in ar=32, expected 50. >>> # job data, expected 0 lines, got 50. >>> # done. >>> -- snip -- >>> >>> AFAIK four things are wrong: >>> 1. The shell receives 50 SIGRTMIN signals... but the SIGRTMIN trap is >>> only called 32 times (the number is variable) >>> 2. It seems even after a loop of $ while ! wait ; do true ; done # the >>> child processes were not reaped... why does that happen ? >>> 3. The output of $ job -l # contains messages like "[47] Lowest >>> priority realtime signal <command unknown>" ... which at least sounds >>> wrong... >>> 4. The realtime value (yes, POSIX realtime signals can pass _values_ >>> via signals) is often 0 (see output "value=0") but this value should >>> occur only one >>> >>> Digging around in the code I found that at least part of the problem >>> is that signals arrive faster than they can be processed by the shell >>> trap... therefore I hacked-up a patch (attached as >>> "ksh93_sigrt_siginfo_queue001.diff.txt") which implements a simple >>> queue system which saves the siginfo data in a single-linked list and >>> uses that list when the matching shell trap is called (e.g. the shell >>> trap is called once for each |siginfo_chain_t| entry). >>> >>> * The good news is: Under valgrind control (which previously only >>> called the shell trap for SIGRTMIN 3-5 times for the example code >>> above) now calls the shell trap exactly 50 times >>> * The bad news is: Without valgrind the number of trap calls is not >>> exactly 50... but the number correlates exactly with the number of $ >>> job -l #-lines complaining about "[47] Lowest priority realtime >>> signal <command unknown>" (see [3] above), e.g. if 8 lines of "[47] >>> Lowest priority realtime signal <command unknown>" occur then array >>> 'ar" has exactly 42 entries... >>> >>> ... erm: David/Phong: Any idea what may go wrong ? What do you think >>> about the patch ([1]) ? >>> >>> [1]=Note the patch is not exactly what I wish for... there are two issues: >>> 1. I'd like to have the shell traps called exactly in-order in which >>> they arrive, e.g. instead of having lists per signal number to queue >>> the siginfo data there should only be one global list (the typical >>> issue Irek brought up was that if a process sends a RTMIN signal and >>> then terminates currently SIGCLD for that process child is executed >>> before the RTMIN signal is processed) >>> 2. The list mangement is not fully async-signal-safe, e.g. this code: >>> -- snip -- >>> + si = >>> (siginfo_chain_t*)shp->siginfo[sig]; >>> + shp->siginfo[sig]=NULL; >>> -- snip -- >>> ... which is used to grab the current list of queued siginfo data for >>> processing may suffer from a race condition when a signal handler is >>> called exactly for these instructions (technically async-signals can >>> interrupt any instruction). >>> A mutex is not possible (for obvious reasons) ... and the "official" >>> way to disable signals (which would mean _all_ signals for which shell >>> traps are registered if we implement a single list for all kinds of >>> siginfo data) during that time is IMO far to heavywheight... any ideas >>> what can be used (yes... I saw the discussion about ASO CAS... can >>> that be used ?) ? >> >> BTW: The patch currently doesn't cover passing SIGCHLD siginfo data to >> the shell traps (this needs to be done in |job_waitsafe()| ... but >> that may be tricky). > > Could you add.sh.status (Exit value or signal) and .sh.pid for CHLD > traps, please?
Appending to this RFE: We like to have .sh.code for CHLD traps too, with .sh.code being a STRING returning one of the CLD_* codes defined in http://pubs.opengroup.org/onlinepubs/7908799/xsh/signal.h.html (i.e. this is defined by X/OPEN and POSIX and therefore should be applicable as shell extension). Irek _______________________________________________ ast-developers mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-developers
