Hi Thomas,

Thanks for the investigation and links.
The variations, across os's, in the status of exited vs reaped (zombie) process have been a
problem for quite a while (for portable apps).

The description of waitpid is focused heavily on child processes; this a particular case is dealing with non-child processes so I stayed with using kill(pid,0) to determine liveness.

Thanks, Roger


On 7/19/2017 4:20 AM, Thomas Stüfe wrote:
Hi Roger,

On Tue, Jul 18, 2017 at 9:01 PM, Roger Riggs <[email protected] <mailto:[email protected]>> wrote:

    Hi Thomas,

    Yes, if there is no access to the pid, then it can't report alive
    or not, and assume not.
    If there access restrictions it will apply to the waitid/waitpid
    in the waitForProcessExit0
    logic also and the answer will be at least consistent (and avoid a
    possible race
    between //proc/pid/psinfo and kill state).

    Thanks, Roger


Okay, sounds reasonable. Interestingly, while reading up on the semantics of kill(), I found:

http://pubs.opengroup.org/onlinepubs/009695399/functions/kill.html

"Existing implementations vary on the result of a kill() with pid indicating an inactive process (a terminated process that has not been waited for by its parent). Some indicate success on such a call (subject to permission checking), while others give an error of [ESRCH]. Since the definition of process lifetime in this volume of IEEE Std 1003.1-2001 covers inactive processes, the [ESRCH] error as described is inappropriate in this case. In particular, this means that an application cannot have a parent process check for termination of a particular child with kill(). (Usually this is done with the null signal; this can be done reliably with waitpid().)"

So, kill() may return success for terminated but not yet reaped processes. I did not know that.

But this does not invalidate your change, does it, if all you want to do is to force one consistent view. At least I did not find any code relying on isAlive returning false for not-yet-reaped processes.

Thanks, Thomas


    On 7/18/2017 2:53 PM, Thomas Stüfe wrote:
    Hi Roger,

    I think this may fail if you have no permission to send a signal
    to that process. In that case, kill(2) may yield EPERM and
    isAlive may return false even though the process is alive.

    But then, I am not sure if that could happen in that particular
    scenario, plus it may also mean that you do not have access to
    /proc/pid either. So, I do not know how much of an issue this
    could be.

    Otherwise, the fix seems straightforward.

    Kind Regards, Thomas

    On Tue, Jul 18, 2017 at 8:46 PM, Roger Riggs
    <[email protected] <mailto:[email protected]>> wrote:

        Please review a fix for an intermittent failure in the
        ProcessHandle OnExitTest
        that fails frequently on Solaris.

        ProcessHandle.isAlive is using /proc/pid/psinfo to determine
        if a process is alive and it's start time.
        However, it appears that the between the process exiting and
        the reaping of its status, the
        psinfo file indicates the process is alive but kill(pid, 0)
        reports that is is not alive.
        Depending on a race, the ProcessHandler.onExit may determine
        the process has exited
        but later isAlive may report it is alive.

        To have a consistent view of the process being alive,
        ProcessHandle.isAlive in its native implementation
        should use kill(pid, 0) to determine if the process is
        definitively determine if the process alive.

        The original issue[1] will be kept open until it is known
        that it is resolved.

        Webrev:
        http://cr.openjdk.java.net/~rriggs/webrev-alive-solaris-8184808/
        <http://cr.openjdk.java.net/%7Erriggs/webrev-alive-solaris-8184808/>

        Issue:
        https://bugs.openjdk.java.net/browse/JDK-8184808
        <https://bugs.openjdk.java.net/browse/JDK-8184808>

        Thanks, Roger

        [1] https://bugs.openjdk.java.net/browse/JDK-8177932
        <https://bugs.openjdk.java.net/browse/JDK-8177932>






Reply via email to