On Tue, Mar 26, 2013 at 11:14 PM, Roland Mainz <[email protected]> wrote:
> Hi!
>
> ----
>
> While playing around with realtime signals I hacked-together a
> "simple" testcase which shows that ksh93 (ast-ksh.2013-03-18):
> -- snip --
> builtin wc
>
> # config
> integer -r num_attackers=50
>
> compound -a ar
>
> trap 'ar+=( integer value=${.sh.sig.value} pid=${.sh.sig.pid} )' RTMIN
>
> integer thispid=$$
> integer i
>
> for (( i=0 ; i < num_attackers ; i++ )) ; do
>         kill -q $i -RTMIN $thispid &
> done
>
> # wait for all child processes
> while ! wait ; do
>         true
> done
>
> # list jobs (this list should be empty after the
> # "wait"-loop above)
> jobdata=${ jobs 2>&1 ; }
> printf '%s\n' "$jobdata"
>
> print -v ar
> printf '# number of array elements in ar=%d, expected %d.\n' \
>         ${#ar[@]} num_attackers
> printf '# job data, expected 0 lines, got %d.\n' \
>         $(wc -l <<<"$jobdata")
>
> print '# done.'
> -- snip --
>
> Running this gives some weired (and variable output):
> -- snip --
> $ ~/bin/ksh sigrtstorm1.sh
> [50] +  Running                 <command unknown>
> [49] -  Running                 <command unknown>
> [48]    Running                 <command unknown>
> [47]   Lowest priority realtime signal <command unknown>
> [46]    Running                 <command unknown>
> [45]    Running                 <command unknown>
> [44]    Running                 <command unknown>
> [43]    Running                 <command unknown>
> [42]    Running                 <command unknown>
> [41]   Lowest priority realtime signal <command unknown>
> [40]    Running                 <command unknown>
> [39]    Running                 <command unknown>
> [38]    Running                 <command unknown>
> [37]    Running                 <command unknown>
> [36]    Running                 <command unknown>
> [35]   Lowest priority realtime signal <command unknown>
> [34]    Running                 <command unknown>
> [33]    Running                 <command unknown>
> [32]   Lowest priority realtime signal <command unknown>
> [31]    Running                 <command unknown>
> [30]    Running                 <command unknown>
> [29]   Lowest priority realtime signal <command unknown>
> [28]    Running                 <command unknown>
> [27]    Running                 <command unknown>
> [26]   Lowest priority realtime signal <command unknown>
> [25]    Running                 <command unknown>
> [24]   Lowest priority realtime signal <command unknown>
> [23]    Running                 <command unknown>
> [22]    Running                 <command unknown>
> [21]    Running                 <command unknown>
> [20]    Running                 <command unknown>
> [19]   Lowest priority realtime signal <command unknown>
> [18]    Running                 <command unknown>
> [17]    Running                 <command unknown>
> [16]   Lowest priority realtime signal <command unknown>
> [15]    Running                 <command unknown>
> [14]    Running                 <command unknown>
> [13]    Running                 <command unknown>
> [12]    Running                 <command unknown>
> [11]   Lowest priority realtime signal <command unknown>
> [10]    Running                 <command unknown>
> [9]   Lowest priority realtime signal <command unknown>
> [8]    Running                 <command unknown>
> [7]    Running                 <command unknown>
> [6]    Running                 <command unknown>
> [5]    Running                 <command unknown>
> [4]    Running                 <command unknown>
> [3]   Lowest priority realtime signal <command unknown>
> [2]    Running                 <command unknown>
> [1]    Running                 <command unknown>
> (
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3261
>                 typeset -l -i value=4
>         )
>         (
>                 typeset -l -i pid=3261
>                 typeset -l -i value=4
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3284
>                 typeset -l -i value=19
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3316
>                 typeset -l -i value=39
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3254
>                 typeset -l -i value=0
>         )
>         (
>                 typeset -l -i pid=3323
>                 typeset -l -i value=44
>         )
>         (
>                 typeset -l -i pid=3319
>                 typeset -l -i value=41
>         )
>         (
>                 typeset -l -i pid=3319
>                 typeset -l -i value=41
>         )
>         (
>                 typeset -l -i pid=3322
>                 typeset -l -i value=43
>         )
>         (
>                 typeset -l -i pid=3329
>                 typeset -l -i value=49
>         )
>         (
>                 typeset -l -i pid=3312
>                 typeset -l -i value=36
>         )
>         (
>                 typeset -l -i pid=3327
>                 typeset -l -i value=47
>         )
> )
> # number of array elements in ar=32, expected 50.
> # job data, expected 0 lines, got 50.
> # done.
> -- snip --
>
> AFAIK four things are wrong:
> 1. The shell receives 50 SIGRTMIN signals... but the SIGRTMIN trap is
> only called 32 times (the number is variable)
> 2. It seems even after a loop of $ while ! wait ; do true ; done # the
> child processes were not reaped... why does that happen ?
> 3. The output of $ job -l # contains messages like "[47]   Lowest
> priority realtime signal <command unknown>" ... which at least sounds
> wrong...
> 4. The realtime value (yes, POSIX realtime signals can pass _values_
> via signals) is often 0 (see output "value=0") but this value should
> occur only one
>
> Digging around in the code I found that at least part of the problem
> is that signals arrive faster than they can be processed by the shell
> trap... therefore I hacked-up a patch (attached as
> "ksh93_sigrt_siginfo_queue001.diff.txt") which implements a simple
> queue system which saves the siginfo data in a single-linked list and
> uses that list when the matching shell trap is called (e.g. the shell
> trap is called once for each |siginfo_chain_t| entry).
>
> * The good news is: Under valgrind control (which previously only
> called the shell trap for SIGRTMIN 3-5 times for the example code
> above) now calls the shell trap exactly 50 times
> * The bad news is: Without valgrind the number of trap calls is not
> exactly 50... but the number correlates exactly with the number of $
> job -l #-lines complaining about "[47]   Lowest priority realtime
> signal <command unknown>" (see [3] above), e.g. if 8 lines of "[47]
> Lowest priority realtime signal <command unknown>" occur then array
> 'ar" has exactly 42 entries...
>
> ... erm: David/Phong: Any idea what may go wrong ? What do you think
> about the patch ([1]) ?
>
> [1]=Note the patch is not exactly what I wish for... there are two issues:
> 1. I'd like to have the shell traps called exactly in-order in which
> they arrive, e.g. instead of having lists per signal number to queue
> the siginfo data there should only be one global list (the typical
> issue Irek brought up was that if a process sends a RTMIN signal and
> then terminates currently SIGCLD for that process child is executed
> before the RTMIN signal is processed)
> 2. The list mangement is not fully async-signal-safe, e.g. this code:
> -- snip --
> +                                       si =
> (siginfo_chain_t*)shp->siginfo[sig];
> +                                       shp->siginfo[sig]=NULL;
> -- snip --
> ... which is used to grab the current list of queued siginfo data for
> processing may suffer from a race condition when a signal handler is
> called exactly for these instructions (technically async-signals can
> interrupt any instruction).
> A mutex is not possible (for obvious reasons) ... and the "official"
> way to disable signals (which would mean _all_ signals for which shell
> traps are registered if we implement a single list for all kinds of
> siginfo data) during that time is IMO far to heavywheight... any ideas
> what can be used (yes... I saw the discussion about ASO CAS... can
> that be used ?) ?

Roland, thanks for working on this issue. I can confirm that this
patch makes signal delivery to traps *reliable* in ksh93 - at least to
a level where mksh, dash and to a lesser extend bash are.

The question is now: How can the the issues with the jobs builtin be
fixed? I think this is the problem causing the remaining signal
'leaks'.

Irek
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to