On Wed, Mar 27, 2013 at 11:11 AM, Irek Szczesniak <[email protected]> wrote:
> On Wed, Mar 27, 2013 at 12:34 AM, Roland Mainz <[email protected]> 
> wrote:
>> On Tue, Mar 26, 2013 at 11:14 PM, Roland Mainz <[email protected]> 
>> wrote:
>>> While playing around with realtime signals I hacked-together a
>>> "simple" testcase which shows that ksh93 (ast-ksh.2013-03-18):
>>> -- snip --
>>> builtin wc
>>>
>>> # config
>>> integer -r num_attackers=50
>>>
>>> compound -a ar
>>>
>>> trap 'ar+=( integer value=${.sh.sig.value} pid=${.sh.sig.pid} )' RTMIN
>>>
>>> integer thispid=$$
>>> integer i
>>>
>>> for (( i=0 ; i < num_attackers ; i++ )) ; do
>>>         kill -q $i -RTMIN $thispid &
>>> done
>>>
>>> # wait for all child processes
>>> while ! wait ; do
>>>         true
>>> done
>>>
>>> # list jobs (this list should be empty after the
>>> # "wait"-loop above)
>>> jobdata=${ jobs 2>&1 ; }
>>> printf '%s\n' "$jobdata"
>>>
>>> print -v ar
>>> printf '# number of array elements in ar=%d, expected %d.\n' \
>>>         ${#ar[@]} num_attackers
>>> printf '# job data, expected 0 lines, got %d.\n' \
>>>         $(wc -l <<<"$jobdata")
>>>
>>> print '# done.'
>>> -- snip --
>>>
>>> Running this gives some weired (and variable output):
>>> -- snip --
>>> $ ~/bin/ksh sigrtstorm1.sh
>>> [50] +  Running                 <command unknown>
>>> [49] -  Running                 <command unknown>
>>> [48]    Running                 <command unknown>
>>> [47]   Lowest priority realtime signal <command unknown>
>>> [46]    Running                 <command unknown>
>>> [45]    Running                 <command unknown>
>>> [44]    Running                 <command unknown>
>>> [43]    Running                 <command unknown>
>>> [42]    Running                 <command unknown>
>>> [41]   Lowest priority realtime signal <command unknown>
>>> [40]    Running                 <command unknown>
>>> [39]    Running                 <command unknown>
>>> [38]    Running                 <command unknown>
>>> [37]    Running                 <command unknown>
>>> [36]    Running                 <command unknown>
>>> [35]   Lowest priority realtime signal <command unknown>
>>> [34]    Running                 <command unknown>
>>> [33]    Running                 <command unknown>
>>> [32]   Lowest priority realtime signal <command unknown>
>>> [31]    Running                 <command unknown>
>>> [30]    Running                 <command unknown>
>>> [29]   Lowest priority realtime signal <command unknown>
>>> [28]    Running                 <command unknown>
>>> [27]    Running                 <command unknown>
>>> [26]   Lowest priority realtime signal <command unknown>
>>> [25]    Running                 <command unknown>
>>> [24]   Lowest priority realtime signal <command unknown>
>>> [23]    Running                 <command unknown>
>>> [22]    Running                 <command unknown>
>>> [21]    Running                 <command unknown>
>>> [20]    Running                 <command unknown>
>>> [19]   Lowest priority realtime signal <command unknown>
>>> [18]    Running                 <command unknown>
>>> [17]    Running                 <command unknown>
>>> [16]   Lowest priority realtime signal <command unknown>
>>> [15]    Running                 <command unknown>
>>> [14]    Running                 <command unknown>
>>> [13]    Running                 <command unknown>
>>> [12]    Running                 <command unknown>
>>> [11]   Lowest priority realtime signal <command unknown>
>>> [10]    Running                 <command unknown>
>>> [9]   Lowest priority realtime signal <command unknown>
>>> [8]    Running                 <command unknown>
>>> [7]    Running                 <command unknown>
>>> [6]    Running                 <command unknown>
>>> [5]    Running                 <command unknown>
>>> [4]    Running                 <command unknown>
>>> [3]   Lowest priority realtime signal <command unknown>
>>> [2]    Running                 <command unknown>
>>> [1]    Running                 <command unknown>
>>> (
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3261
>>>                 typeset -l -i value=4
>>>         )
>>>         (
>>>                 typeset -l -i pid=3261
>>>                 typeset -l -i value=4
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3284
>>>                 typeset -l -i value=19
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3316
>>>                 typeset -l -i value=39
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3254
>>>                 typeset -l -i value=0
>>>         )
>>>         (
>>>                 typeset -l -i pid=3323
>>>                 typeset -l -i value=44
>>>         )
>>>         (
>>>                 typeset -l -i pid=3319
>>>                 typeset -l -i value=41
>>>         )
>>>         (
>>>                 typeset -l -i pid=3319
>>>                 typeset -l -i value=41
>>>         )
>>>         (
>>>                 typeset -l -i pid=3322
>>>                 typeset -l -i value=43
>>>         )
>>>         (
>>>                 typeset -l -i pid=3329
>>>                 typeset -l -i value=49
>>>         )
>>>         (
>>>                 typeset -l -i pid=3312
>>>                 typeset -l -i value=36
>>>         )
>>>         (
>>>                 typeset -l -i pid=3327
>>>                 typeset -l -i value=47
>>>         )
>>> )
>>> # number of array elements in ar=32, expected 50.
>>> # job data, expected 0 lines, got 50.
>>> # done.
>>> -- snip --
>>>
>>> AFAIK four things are wrong:
>>> 1. The shell receives 50 SIGRTMIN signals... but the SIGRTMIN trap is
>>> only called 32 times (the number is variable)
>>> 2. It seems even after a loop of $ while ! wait ; do true ; done # the
>>> child processes were not reaped... why does that happen ?
>>> 3. The output of $ job -l # contains messages like "[47]   Lowest
>>> priority realtime signal <command unknown>" ... which at least sounds
>>> wrong...
>>> 4. The realtime value (yes, POSIX realtime signals can pass _values_
>>> via signals) is often 0 (see output "value=0") but this value should
>>> occur only one
>>>
>>> Digging around in the code I found that at least part of the problem
>>> is that signals arrive faster than they can be processed by the shell
>>> trap... therefore I hacked-up a patch (attached as
>>> "ksh93_sigrt_siginfo_queue001.diff.txt") which implements a simple
>>> queue system which saves the siginfo data in a single-linked list and
>>> uses that list when the matching shell trap is called (e.g. the shell
>>> trap is called once for each |siginfo_chain_t| entry).
>>>
>>> * The good news is: Under valgrind control (which previously only
>>> called the shell trap for SIGRTMIN 3-5 times for the example code
>>> above) now calls the shell trap exactly 50 times
>>> * The bad news is: Without valgrind the number of trap calls is not
>>> exactly 50... but the number correlates exactly with the number of $
>>> job -l #-lines complaining about "[47]   Lowest priority realtime
>>> signal <command unknown>" (see [3] above), e.g. if 8 lines of "[47]
>>> Lowest priority realtime signal <command unknown>" occur then array
>>> 'ar" has exactly 42 entries...
>>>
>>> ... erm: David/Phong: Any idea what may go wrong ? What do you think
>>> about the patch ([1]) ?
>>>
>>> [1]=Note the patch is not exactly what I wish for... there are two issues:
>>> 1. I'd like to have the shell traps called exactly in-order in which
>>> they arrive, e.g. instead of having lists per signal number to queue
>>> the siginfo data there should only be one global list (the typical
>>> issue Irek brought up was that if a process sends a RTMIN signal and
>>> then terminates currently SIGCLD for that process child is executed
>>> before the RTMIN signal is processed)
>>> 2. The list mangement is not fully async-signal-safe, e.g. this code:
>>> -- snip --
>>> +                                       si =
>>> (siginfo_chain_t*)shp->siginfo[sig];
>>> +                                       shp->siginfo[sig]=NULL;
>>> -- snip --
>>> ... which is used to grab the current list of queued siginfo data for
>>> processing may suffer from a race condition when a signal handler is
>>> called exactly for these instructions (technically async-signals can
>>> interrupt any instruction).
>>> A mutex is not possible (for obvious reasons) ... and the "official"
>>> way to disable signals (which would mean _all_ signals for which shell
>>> traps are registered if we implement a single list for all kinds of
>>> siginfo data) during that time is IMO far to heavywheight... any ideas
>>> what can be used (yes... I saw the discussion about ASO CAS... can
>>> that be used ?) ?
>>
>> BTW: The patch currently doesn't cover passing SIGCHLD siginfo data to
>> the shell traps (this needs to be done in |job_waitsafe()| ... but
>> that may be tricky).
>
> Could you add.sh.status (Exit value or signal) and .sh.pid for CHLD
> traps, please?

Appending to this RFE:
We like to have .sh.code for CHLD traps too, with .sh.code being a
STRING returning one of the CLD_* codes defined in
http://pubs.opengroup.org/onlinepubs/7908799/xsh/signal.h.html (i.e.
this is defined by X/OPEN and POSIX and therefore should be applicable
as shell extension).

Irek
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to