Hi Ludo!

Ludovic Courtès <[email protected]> writes:

> Hello,
>
> Maxim Cournoyer <[email protected]> skribis:
>
>> + herd -s t-socket-1651 status root
>> Started:
>>  + root
>> + herd -s t-socket-1651 stop root
>> ++ cat t-pid-1651
>> + kill 1896
>> + exit 1
>> + rm -f t-socket-1651
>> + test -f t-pid-1651
>> ++ cat t-pid-1651
>> + kill 1896
>> + rm -f t-pid-1651
>> FAIL tests/no-home.sh (exit status: 1)
>
> What happens here is that the shepherd process is still alive after
> ‘herd stop root’ has completed, contrary to what’s expected:
>
> $herd stop root
>
> if kill `cat "$pid"`
> then
>     exit 1
> fi

Yes!

[...]

> Maybe there’s a chance that the shell hasn’t processed the shepherd’s
> SIGCHLD when it evaluates the “if kill `cat "$pid"`” condition; in that
> case, the shepherd process still exists as a zombie.
>
> A more robust approach might be to use the shell’s builtin ‘wait’,
> because then I suppose the shell will be forced to process pending
> SIGCHLDs:
>
> diff --git a/tests/no-home.sh b/tests/no-home.sh
> index 85b6116..5a8c278 100644
> --- a/tests/no-home.sh
> +++ b/tests/no-home.sh
> @@ -1,5 +1,5 @@
>  # GNU Shepherd --- Make sure shepherd doesn't fail when $HOME is not 
> writable.
> -# Copyright © 2014, 2016 Ludovic Courtès <[email protected]>
> +# Copyright © 2014, 2016, 2022 Ludovic Courtès <[email protected]>
>  #
>  # This file is part of the GNU Shepherd.
>  #
> @@ -46,7 +46,4 @@ kill -0 `cat "$pid"`
>  $herd status root
>  $herd stop root
>  
> -if kill `cat "$pid"`
> -then
> -    exit 1
> -fi
> +wait `cat "$pid"`

As I wrote, I was also unable to reproduce this (but when I had a high
load of packages to build at the same time, I could get it to happen a
couple times upon retrying).  Your analysis (and the narrow window which
would allow for a failure) makes sense to me, along with the proposed
fix.

I think you should commit it and tentatively mark this bug as fixed :-).

Thank you for looking into it!

Maxim



Reply via email to