Hi Roger, On Thu, Jul 20, 2017 at 4:25 PM, Roger Riggs <[email protected]> wrote:
> Hi Thomas, > > Thanks for the investigation and links. > The variations, across os's, in the status of exited vs reaped (zombie) > process have been a > problem for quite a while (for portable apps). > > The description of waitpid is focused heavily on child processes; this a > particular case > is dealing with non-child processes so I stayed with using kill(pid,0) to > determine liveness. > > Thanks, Roger > > That makes sense. Thanks for clarifying. ..Thomas > On 7/19/2017 4:20 AM, Thomas Stüfe wrote: > > Hi Roger, > > On Tue, Jul 18, 2017 at 9:01 PM, Roger Riggs <[email protected]> > wrote: > >> Hi Thomas, >> >> Yes, if there is no access to the pid, then it can't report alive or not, >> and assume not. >> If there access restrictions it will apply to the waitid/waitpid in the >> waitForProcessExit0 >> logic also and the answer will be at least consistent (and avoid a >> possible race >> between //proc/pid/psinfo and kill state). >> >> Thanks, Roger >> >> > Okay, sounds reasonable. Interestingly, while reading up on the semantics > of kill(), I found: > > http://pubs.opengroup.org/onlinepubs/009695399/functions/kill.html > > "Existing implementations vary on the result of a kill() with pid > indicating an inactive process (a terminated process that has not been > waited for by its parent). Some indicate success on such a call (subject to > permission checking), while others give an error of [ESRCH]. Since the > definition of process lifetime in this volume of IEEE Std 1003.1-2001 > covers inactive processes, the [ESRCH] error as described is inappropriate > in this case. In particular, this means that an application cannot have a > parent process check for termination of a particular child with kill(). > (Usually this is done with the null signal; this can be done reliably with > waitpid().)" > > So, kill() may return success for terminated but not yet reaped processes. > I did not know that. > > But this does not invalidate your change, does it, if all you want to do > is to force one consistent view. At least I did not find any code relying > on isAlive returning false for not-yet-reaped processes. > > Thanks, Thomas > > > >> >> On 7/18/2017 2:53 PM, Thomas Stüfe wrote: >> >> Hi Roger, >> >> I think this may fail if you have no permission to send a signal to that >> process. In that case, kill(2) may yield EPERM and isAlive may return false >> even though the process is alive. >> >> But then, I am not sure if that could happen in that particular scenario, >> plus it may also mean that you do not have access to /proc/pid either. So, >> I do not know how much of an issue this could be. >> >> Otherwise, the fix seems straightforward. >> >> Kind Regards, Thomas >> >> On Tue, Jul 18, 2017 at 8:46 PM, Roger Riggs <[email protected]> >> wrote: >> >>> Please review a fix for an intermittent failure in the ProcessHandle >>> OnExitTest >>> that fails frequently on Solaris. >>> >>> ProcessHandle.isAlive is using /proc/pid/psinfo to determine if a >>> process is alive and it's start time. >>> However, it appears that the between the process exiting and the reaping >>> of its status, the >>> psinfo file indicates the process is alive but kill(pid, 0) reports that >>> is is not alive. >>> Depending on a race, the ProcessHandler.onExit may determine the process >>> has exited >>> but later isAlive may report it is alive. >>> >>> To have a consistent view of the process being alive, >>> ProcessHandle.isAlive in its native implementation >>> should use kill(pid, 0) to determine if the process is definitively >>> determine if the process alive. >>> >>> The original issue[1] will be kept open until it is known that it is >>> resolved. >>> >>> Webrev: >>> http://cr.openjdk.java.net/~rriggs/webrev-alive-solaris-8184808/ >>> >>> Issue: >>> https://bugs.openjdk.java.net/browse/JDK-8184808 >>> >>> Thanks, Roger >>> >>> [1] https://bugs.openjdk.java.net/browse/JDK-8177932 >>> >>> >>> >> >> > >
