On 27 March 2013 11:15, Irek Szczesniak <[email protected]> wrote: > On Tue, Mar 26, 2013 at 11:14 PM, Roland Mainz <[email protected]> > wrote: >> Hi! >> >> ---- >> >> While playing around with realtime signals I hacked-together a >> "simple" testcase which shows that ksh93 (ast-ksh.2013-03-18): >> -- snip -- >> builtin wc >> >> # config >> integer -r num_attackers=50 >> >> compound -a ar >> >> trap 'ar+=( integer value=${.sh.sig.value} pid=${.sh.sig.pid} )' RTMIN >> >> integer thispid=$$ >> integer i >> >> for (( i=0 ; i < num_attackers ; i++ )) ; do >> kill -q $i -RTMIN $thispid & >> done >> >> # wait for all child processes >> while ! wait ; do >> true >> done >> >> # list jobs (this list should be empty after the >> # "wait"-loop above) >> jobdata=${ jobs 2>&1 ; } >> printf '%s\n' "$jobdata" >> >> print -v ar >> printf '# number of array elements in ar=%d, expected %d.\n' \ >> ${#ar[@]} num_attackers >> printf '# job data, expected 0 lines, got %d.\n' \ >> $(wc -l <<<"$jobdata") >> >> print '# done.' >> -- snip -- >> >> Running this gives some weired (and variable output): >> -- snip -- >> $ ~/bin/ksh sigrtstorm1.sh >> [50] + Running <command unknown> >> [49] - Running <command unknown> >> [48] Running <command unknown> >> [47] Lowest priority realtime signal <command unknown> >> [46] Running <command unknown> >> [45] Running <command unknown> >> [44] Running <command unknown> >> [43] Running <command unknown> >> [42] Running <command unknown> >> [41] Lowest priority realtime signal <command unknown> >> [40] Running <command unknown> >> [39] Running <command unknown> >> [38] Running <command unknown> >> [37] Running <command unknown> >> [36] Running <command unknown> >> [35] Lowest priority realtime signal <command unknown> >> [34] Running <command unknown> >> [33] Running <command unknown> >> [32] Lowest priority realtime signal <command unknown> >> [31] Running <command unknown> >> [30] Running <command unknown> >> [29] Lowest priority realtime signal <command unknown> >> [28] Running <command unknown> >> [27] Running <command unknown> >> [26] Lowest priority realtime signal <command unknown> >> [25] Running <command unknown> >> [24] Lowest priority realtime signal <command unknown> >> [23] Running <command unknown> >> [22] Running <command unknown> >> [21] Running <command unknown> >> [20] Running <command unknown> >> [19] Lowest priority realtime signal <command unknown> >> [18] Running <command unknown> >> [17] Running <command unknown> >> [16] Lowest priority realtime signal <command unknown> >> [15] Running <command unknown> >> [14] Running <command unknown> >> [13] Running <command unknown> >> [12] Running <command unknown> >> [11] Lowest priority realtime signal <command unknown> >> [10] Running <command unknown> >> [9] Lowest priority realtime signal <command unknown> >> [8] Running <command unknown> >> [7] Running <command unknown> >> [6] Running <command unknown> >> [5] Running <command unknown> >> [4] Running <command unknown> >> [3] Lowest priority realtime signal <command unknown> >> [2] Running <command unknown> >> [1] Running <command unknown> >> ( >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3261 >> typeset -l -i value=4 >> ) >> ( >> typeset -l -i pid=3261 >> typeset -l -i value=4 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3284 >> typeset -l -i value=19 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3316 >> typeset -l -i value=39 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3254 >> typeset -l -i value=0 >> ) >> ( >> typeset -l -i pid=3323 >> typeset -l -i value=44 >> ) >> ( >> typeset -l -i pid=3319 >> typeset -l -i value=41 >> ) >> ( >> typeset -l -i pid=3319 >> typeset -l -i value=41 >> ) >> ( >> typeset -l -i pid=3322 >> typeset -l -i value=43 >> ) >> ( >> typeset -l -i pid=3329 >> typeset -l -i value=49 >> ) >> ( >> typeset -l -i pid=3312 >> typeset -l -i value=36 >> ) >> ( >> typeset -l -i pid=3327 >> typeset -l -i value=47 >> ) >> ) >> # number of array elements in ar=32, expected 50. >> # job data, expected 0 lines, got 50. >> # done. >> -- snip -- >> >> AFAIK four things are wrong: >> 1. The shell receives 50 SIGRTMIN signals... but the SIGRTMIN trap is >> only called 32 times (the number is variable) >> 2. It seems even after a loop of $ while ! wait ; do true ; done # the >> child processes were not reaped... why does that happen ? >> 3. The output of $ job -l # contains messages like "[47] Lowest >> priority realtime signal <command unknown>" ... which at least sounds >> wrong... >> 4. The realtime value (yes, POSIX realtime signals can pass _values_ >> via signals) is often 0 (see output "value=0") but this value should >> occur only one >> >> Digging around in the code I found that at least part of the problem >> is that signals arrive faster than they can be processed by the shell >> trap... therefore I hacked-up a patch (attached as >> "ksh93_sigrt_siginfo_queue001.diff.txt") which implements a simple >> queue system which saves the siginfo data in a single-linked list and >> uses that list when the matching shell trap is called (e.g. the shell >> trap is called once for each |siginfo_chain_t| entry). >> >> * The good news is: Under valgrind control (which previously only >> called the shell trap for SIGRTMIN 3-5 times for the example code >> above) now calls the shell trap exactly 50 times >> * The bad news is: Without valgrind the number of trap calls is not >> exactly 50... but the number correlates exactly with the number of $ >> job -l #-lines complaining about "[47] Lowest priority realtime >> signal <command unknown>" (see [3] above), e.g. if 8 lines of "[47] >> Lowest priority realtime signal <command unknown>" occur then array >> 'ar" has exactly 42 entries... >> >> ... erm: David/Phong: Any idea what may go wrong ? What do you think >> about the patch ([1]) ? >> >> [1]=Note the patch is not exactly what I wish for... there are two issues: >> 1. I'd like to have the shell traps called exactly in-order in which >> they arrive, e.g. instead of having lists per signal number to queue >> the siginfo data there should only be one global list (the typical >> issue Irek brought up was that if a process sends a RTMIN signal and >> then terminates currently SIGCLD for that process child is executed >> before the RTMIN signal is processed) >> 2. The list mangement is not fully async-signal-safe, e.g. this code: >> -- snip -- >> + si = >> (siginfo_chain_t*)shp->siginfo[sig]; >> + shp->siginfo[sig]=NULL; >> -- snip -- >> ... which is used to grab the current list of queued siginfo data for >> processing may suffer from a race condition when a signal handler is >> called exactly for these instructions (technically async-signals can >> interrupt any instruction). >> A mutex is not possible (for obvious reasons) ... and the "official" >> way to disable signals (which would mean _all_ signals for which shell >> traps are registered if we implement a single list for all kinds of >> siginfo data) during that time is IMO far to heavywheight... any ideas >> what can be used (yes... I saw the discussion about ASO CAS... can >> that be used ?) ? > > Roland, thanks for working on this issue. I can confirm that this > patch makes signal delivery to traps *reliable* in ksh93 - at least to > a level where mksh, dash and to a lesser extend bash are.
I second that. David, can you please backport the patch to ksh93 u+ once Roland finished it? > > The question is now: How can the the issues with the jobs builtin be > fixed? I think this is the problem causing the remaining signal > 'leaks'. > > Irek > _______________________________________________ > ast-developers mailing list > [email protected] > http://lists.research.att.com/mailman/listinfo/ast-developers Ced -- Cedric Blancher <[email protected]> Institute Pasteur _______________________________________________ ast-developers mailing list [email protected] http://lists.research.att.com/mailman/listinfo/ast-developers
