On Fri, Jun 7, 2013 at 11:05 PM, Roland Mainz <[email protected]> wrote:
> Hi!
>
> ----
>
> While testing the signal issues we stumbled over an issue with the
> "wait" builtin - it seems doing a $ while ! wait ; do true ; done #
> (the loop is neccesary since "wait" can be interupted by signals)
> doesn't wait for all child processes launched by the shell.
>
> The following testcase...
> -- snip --
> $ cat ksh_waitjobs1.sh
>
> typeset -i i
>
> for (( i=0 ; i < 256 ; i++ )) ; do
>         {
>                 sleep 10
>                 exit 0
>         } &
> done
>
> while ! wait ; do
>         true
> done
>
> jl="$(LC_ALL='C' jobs -l | fgrep Running)"
>
> if [[ "$jl" == '' ]] ; then
>         printf '# success.\n'
>         exit 0
> else
>         printf '# error: job list not empty:\n%s\n' "$jl"
>         exit 1
> fi
>
> # notreached
> -- snip --
>
> ... works flawlessly with bash4 (bash 4.2.42(1)) on SuSE 12.3/AMD64/64bit:
> -- snip --
> $ bash ksh_waitjobs1.sh
> # success.
> -- snip --
>
> ... but the same script fails with ast-ksh.2013-05-24:
> -- snip --
> $ ~/bin/ksh ksh_waitjobs1.sh
> # error: job list not empty:
> [256] + 8094     Running                 <command unknown>
> [255] - 8093     Running                 <command unknown>
> [254]   8092     Running                 <command unknown>
> [253]   8091     Running                 <command unknown>
> [252]   8090     Running                 <command unknown>
> [251]   8089     Running                 <command unknown>
> [250]   8088     Running                 <command unknown>
> [249]   8087     Running                 <command unknown>
> [snip]
> [6]   7844       Running                 <command unknown>
> [5]   7843       Running                 <command unknown>
> [4]   7842       Running                 <command unknown>
> [3]   7841       Running                 <command unknown>
> [2]   7840       Running                 <command unknown>
> [1]   7839       Running                 <command unknown>
> -- snip --
>
> Uhm... is this a bug or am I doing something wrong ?

AFAIK I found the root cause for the issue...
... the following modification of the original script...
-- snip --
typeset -i i

for (( i=0 ; i < 256 ; i++ )) ; do
        {
                sleep 10
                exit 0
        } &
done

while ! wait ; do
        true
done

jl="$(LC_ALL='C' jobs -l | fgrep 'Running')"

if [[ "$jl" != '' ]] ; then
        printf '# error: job list not empty:\n%s\n' "$jl"

        (( i=0 ))
        while true ; do
                (( i++ ))
                jl="$(LC_ALL='C' jobs -l | fgrep 'Running')"
                [[ "$jl" == '' ]] && break
                sleep 0.0001
        done

        printf '# took %d cycles to drain the queue.\n' i
fi

exit 0
-- snip --

... shows that it *ALWAYS* needs exactly one loop cycle ("# took 1
cycles to drain the queue.") until "job -l" reports that no jobs are
left:
-- snip --
$ ksh ksh_waitjobs1.sh
# error: job list not empty:
[256] + 13139    Running                 <command unknown>
[255] - 13138    Running                 <command unknown>
[254]   13137    Running                 <command unknown>
[253]   13136    Running                 <command unknown>
[252]   13135    Running                 <command unknown>
[251]   13134    Running                 <command unknown>
[250]   13133    Running                 <command unknown>
[249]   13132    Running                 <command unknown>
[248]   13131    Running                 <command unknown>
[247]   13130    Running                 <command unknown>
[246]   13129    Running                 <command unknown>
[245]   13128    Running                 <command unknown>
[244]   13127    Running                 <command unknown>
[243]   13126    Running                 <command unknown>
[242]   13125    Running                 <command unknown>
[241]   13124    Running                 <command unknown>
[snip]
[5]   12888      Running                 <command unknown>
[4]   12887      Running                 <command unknown>
[3]   12886      Running                 <command unknown>
[2]   12885      Running                 <command unknown>
[1]   12884      Running                 <command unknown>
# took 1 cycles to drain the queue.
-- snip --

It looks that calling $ jobs -l # doesn't check whether the state of
the child processes has changed... but something does it _after_ $
jobs -l # printed it's output (e.g. calling $ jobs -l # or any
external process does do the checking...).

IMO $ jobs -l # should always reflect the latest status of the child
process as the system reports it via the SIGCHLD handler's siginfo
structure (if anyone wants to listen to all the state changes of the
child process he/she has to set-up a CHLD trap and look at the
.sh.sig.code/.sh.sig.pid variables ...).

The original testcase with a workaround applied (and usage of
"grep"+pipe chain removed to avoid that it can count as external
process) looks like this:
-- snip --
typeset -i i

for (( i=0 ; i < 256 ; i++ )) ; do
        {
                sleep 10
                exit 0
        } &
done

while ! wait ; do
        true
done

# run external process (to keep jobs -l output updated) as
# workaround for ksh93 <= ast-ksh.2013-05-24
/usr/bin/true >'/dev/null'

jl="${ LC_ALL='C' jobs -l ; }"

if [[ "$jl" != *Running* ]] ; then
        printf '# success.\n'
        exit 0
else
        printf '# error: job list not empty:\n%s\n' "$jl"
        exit 1
fi

# notreached
-- snip --

----

Bye,
Roland

-- 
  __ .  . __
 (o.\ \/ /.o) [email protected]
  \__\/\/__/  MPEG specialist, C&&JAVA&&Sun&&Unix programmer
  /O /==\ O\  TEL +49 641 3992797
 (;O/ \/ \O;)
_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to