Re: Async processes started in functions not reliably started

Robert Elz Mon, 05 Aug 2019 15:50:43 -0700

    Date:        Mon, 05 Aug 2019 14:05:43 +0200
    From:        Steffen Nurpmeso <stef...@sdaoden.eu>
    Message-ID:  <20190805120543.bf9-u%stef...@sdaoden.eu>


  | Would be nice to have some shell support for signalling the parent
  | that the child is now functional,

The shell cannot really know - your example was not functional until
after it set up the traps.

But the shell code knows, something like the following might work
(untested, not even given off to bash to check syntax, and uses of $?
would need to be sanitised (value saved) with just what is here it is
OK, but real code to replace the comments would probably need to use it
again)

In the parent:

        OK=false
        T=$(trap -p USR2)               # only needed if USR2 might be trapped 
already
        trap 'OK=true' USR2

        run_the_child &
        if ! $OK && wait $!
        then
                echo "Child failed to initialise properly! >&2
                # and whatever else you want to do
        elif $OK
        then
                : # here the child is running, and ready
        else
                echo "Failure: $? from child" >& 2

                # either the child did exit N (N != 0) in which
                # case $? will tell us why it failed, or some
                # stray signal was delivered (and caught) by the
                # current shell ... deal with those possibilities
        fi
        case "$T" in    # if T= was needed above
        '') trap - USR2;;       # bash would have said nothing if trap was 
default
         *) eval "$T"    ;; # for other shells which do, or if USR2 was trapped.
        esac
        # continue with parent code, now knowing that child has init'd itself



In the child:

        trap 'whatever' SIG_I_NEED
        # any other init that is needed

        kill -s USR2 $$ # or if the parent pid is not $$, use whatever is.

        # do whatever the child is supposed to do

The wait is to pause the parent - an exit 0 from it should not happen,
and indicates that the child did exit 0 which it is not supposed to do
at this point.  The ! $OK test before the wait is in case the child
started very quickly, and the signal already arrived.   There is still
a race condition here (having the child sleep for a brief interval as
part of its init would help reduce the probability of problems from that).
Pity the shell has no way to allow scripts to block signals (ie: sigblock).

If the wait is interrupted by a signal, (or if the USR2 signal happened
earlier and we skip the wait) and it was USR2 (from the child) then OK
will become true, and the child is ready to continue.   If the wait
exits for some other reason, then perhaps some other signal was delivered,
and caught, and did not exit the shell) - if that's possible the wait should
be in a loop (ie: while :; do if wait ...) and this case should cause the
loop to iterate, whereas all the other possibilities end in break, or the
child did exit N indicating that some failure happened before it init'd
itself.

No temp files, named pipes, or othe similar stateful mechanisms needed.
What's more, aside from the "trap -p" which is probably not going to be
needed (the script writer knows no other USR2 trap is already set) all of
this is POSIX code (even the trap -p will be in the next version).

kre

ps: the function in the example is badly named, to "reap" is to harvest
or collect, what the function given that name is actually doing is
killing other processes (the original parent collects them, not that
child) - a better name would be assassin than reaper (it isn't even the
"Grim Reaper").

Re: Async processes started in functions not reliably started

Reply via email to