On 27 March 2013 11:15, Irek Szczesniak <[email protected]> wrote:
> On Tue, Mar 26, 2013 at 11:14 PM, Roland Mainz <[email protected]> 
> wrote:
>> Hi!
>>
>> ----
>>
>> While playing around with realtime signals I hacked-together a
>> "simple" testcase which shows that ksh93 (ast-ksh.2013-03-18):
>> -- snip --
>> builtin wc
>>
>> # config
>> integer -r num_attackers=50
>>
>> compound -a ar
>>
>> trap 'ar+=( integer value=${.sh.sig.value} pid=${.sh.sig.pid} )' RTMIN
>>
>> integer thispid=$$
>> integer i
>>
>> for (( i=0 ; i < num_attackers ; i++ )) ; do
>>         kill -q $i -RTMIN $thispid &
>> done
>>
>> # wait for all child processes
>> while ! wait ; do
>>         true
>> done
>>
>> # list jobs (this list should be empty after the
>> # "wait"-loop above)
>> jobdata=${ jobs 2>&1 ; }
>> printf '%s\n' "$jobdata"
>>
>> print -v ar
>> printf '# number of array elements in ar=%d, expected %d.\n' \
>>         ${#ar[@]} num_attackers
>> printf '# job data, expected 0 lines, got %d.\n' \
>>         $(wc -l <<<"$jobdata")
>>
>> print '# done.'
>> -- snip --
>>
>> Running this gives some weired (and variable output):
>> -- snip --
>> $ ~/bin/ksh sigrtstorm1.sh
>> [50] +  Running                 <command unknown>
>> [49] -  Running                 <command unknown>
>> [48]    Running                 <command unknown>
>> [47]   Lowest priority realtime signal <command unknown>
>> [46]    Running                 <command unknown>
>> [45]    Running                 <command unknown>
>> [44]    Running                 <command unknown>
>> [43]    Running                 <command unknown>
>> [42]    Running                 <command unknown>
>> [41]   Lowest priority realtime signal <command unknown>
>> [40]    Running                 <command unknown>
>> [39]    Running                 <command unknown>
>> [38]    Running                 <command unknown>
>> [37]    Running                 <command unknown>
>> [36]    Running                 <command unknown>
>> [35]   Lowest priority realtime signal <command unknown>
>> [34]    Running                 <command unknown>
>> [33]    Running                 <command unknown>
>> [32]   Lowest priority realtime signal <command unknown>
>> [31]    Running                 <command unknown>
>> [30]    Running                 <command unknown>
>> [29]   Lowest priority realtime signal <command unknown>
>> [28]    Running                 <command unknown>
>> [27]    Running                 <command unknown>
>> [26]   Lowest priority realtime signal <command unknown>
>> [25]    Running                 <command unknown>
>> [24]   Lowest priority realtime signal <command unknown>
>> [23]    Running                 <command unknown>
>> [22]    Running                 <command unknown>
>> [21]    Running                 <command unknown>
>> [20]    Running                 <command unknown>
>> [19]   Lowest priority realtime signal <command unknown>
>> [18]    Running                 <command unknown>
>> [17]    Running                 <command unknown>
>> [16]   Lowest priority realtime signal <command unknown>
>> [15]    Running                 <command unknown>
>> [14]    Running                 <command unknown>
>> [13]    Running                 <command unknown>
>> [12]    Running                 <command unknown>
>> [11]   Lowest priority realtime signal <command unknown>
>> [10]    Running                 <command unknown>
>> [9]   Lowest priority realtime signal <command unknown>
>> [8]    Running                 <command unknown>
>> [7]    Running                 <command unknown>
>> [6]    Running                 <command unknown>
>> [5]    Running                 <command unknown>
>> [4]    Running                 <command unknown>
>> [3]   Lowest priority realtime signal <command unknown>
>> [2]    Running                 <command unknown>
>> [1]    Running                 <command unknown>
>> (
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3261
>>                 typeset -l -i value=4
>>         )
>>         (
>>                 typeset -l -i pid=3261
>>                 typeset -l -i value=4
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3284
>>                 typeset -l -i value=19
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3316
>>                 typeset -l -i value=39
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3254
>>                 typeset -l -i value=0
>>         )
>>         (
>>                 typeset -l -i pid=3323
>>                 typeset -l -i value=44
>>         )
>>         (
>>                 typeset -l -i pid=3319
>>                 typeset -l -i value=41
>>         )
>>         (
>>                 typeset -l -i pid=3319
>>                 typeset -l -i value=41
>>         )
>>         (
>>                 typeset -l -i pid=3322
>>                 typeset -l -i value=43
>>         )
>>         (
>>                 typeset -l -i pid=3329
>>                 typeset -l -i value=49
>>         )
>>         (
>>                 typeset -l -i pid=3312
>>                 typeset -l -i value=36
>>         )
>>         (
>>                 typeset -l -i pid=3327
>>                 typeset -l -i value=47
>>         )
>> )
>> # number of array elements in ar=32, expected 50.
>> # job data, expected 0 lines, got 50.
>> # done.
>> -- snip --
>>
>> AFAIK four things are wrong:
>> 1. The shell receives 50 SIGRTMIN signals... but the SIGRTMIN trap is
>> only called 32 times (the number is variable)
>> 2. It seems even after a loop of $ while ! wait ; do true ; done # the
>> child processes were not reaped... why does that happen ?
>> 3. The output of $ job -l # contains messages like "[47]   Lowest
>> priority realtime signal <command unknown>" ... which at least sounds
>> wrong...
>> 4. The realtime value (yes, POSIX realtime signals can pass _values_
>> via signals) is often 0 (see output "value=0") but this value should
>> occur only one
>>
>> Digging around in the code I found that at least part of the problem
>> is that signals arrive faster than they can be processed by the shell
>> trap... therefore I hacked-up a patch (attached as
>> "ksh93_sigrt_siginfo_queue001.diff.txt") which implements a simple
>> queue system which saves the siginfo data in a single-linked list and
>> uses that list when the matching shell trap is called (e.g. the shell
>> trap is called once for each |siginfo_chain_t| entry).
>>
>> * The good news is: Under valgrind control (which previously only
>> called the shell trap for SIGRTMIN 3-5 times for the example code
>> above) now calls the shell trap exactly 50 times
>> * The bad news is: Without valgrind the number of trap calls is not
>> exactly 50... but the number correlates exactly with the number of $
>> job -l #-lines complaining about "[47]   Lowest priority realtime
>> signal <command unknown>" (see [3] above), e.g. if 8 lines of "[47]
>> Lowest priority realtime signal <command unknown>" occur then array
>> 'ar" has exactly 42 entries...
>>
>> ... erm: David/Phong: Any idea what may go wrong ? What do you think
>> about the patch ([1]) ?
>>
>> [1]=Note the patch is not exactly what I wish for... there are two issues:
>> 1. I'd like to have the shell traps called exactly in-order in which
>> they arrive, e.g. instead of having lists per signal number to queue
>> the siginfo data there should only be one global list (the typical
>> issue Irek brought up was that if a process sends a RTMIN signal and
>> then terminates currently SIGCLD for that process child is executed
>> before the RTMIN signal is processed)
>> 2. The list mangement is not fully async-signal-safe, e.g. this code:
>> -- snip --
>> +                                       si =
>> (siginfo_chain_t*)shp->siginfo[sig];
>> +                                       shp->siginfo[sig]=NULL;
>> -- snip --
>> ... which is used to grab the current list of queued siginfo data for
>> processing may suffer from a race condition when a signal handler is
>> called exactly for these instructions (technically async-signals can
>> interrupt any instruction).
>> A mutex is not possible (for obvious reasons) ... and the "official"
>> way to disable signals (which would mean _all_ signals for which shell
>> traps are registered if we implement a single list for all kinds of
>> siginfo data) during that time is IMO far to heavywheight... any ideas
>> what can be used (yes... I saw the discussion about ASO CAS... can
>> that be used ?) ?
>
> Roland, thanks for working on this issue. I can confirm that this
> patch makes signal delivery to traps *reliable* in ksh93 - at least to
> a level where mksh, dash and to a lesser extend bash are.

I second that.

David, can you please backport the patch to ksh93 u+ once Roland finished it?
>
> The question is now: How can the the issues with the jobs builtin be
> fixed? I think this is the problem causing the remaining signal
> 'leaks'.
>
> Irek
> _______________________________________________
> ast-developers mailing list
> [email protected]
> http://lists.research.att.com/mailman/listinfo/ast-developers


Ced
-- 
Cedric Blancher <[email protected]>
Institute Pasteur
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to