Hi Roger, On Tue, Jul 18, 2017 at 9:01 PM, Roger Riggs <[email protected]> wrote:
> Hi Thomas, > > Yes, if there is no access to the pid, then it can't report alive or not, > and assume not. > If there access restrictions it will apply to the waitid/waitpid in the > waitForProcessExit0 > logic also and the answer will be at least consistent (and avoid a > possible race > between //proc/pid/psinfo and kill state). > > Thanks, Roger > > Okay, sounds reasonable. Interestingly, while reading up on the semantics of kill(), I found: http://pubs.opengroup.org/onlinepubs/009695399/functions/kill.html "Existing implementations vary on the result of a kill() with pid indicating an inactive process (a terminated process that has not been waited for by its parent). Some indicate success on such a call (subject to permission checking), while others give an error of [ESRCH]. Since the definition of process lifetime in this volume of IEEE Std 1003.1-2001 covers inactive processes, the [ESRCH] error as described is inappropriate in this case. In particular, this means that an application cannot have a parent process check for termination of a particular child with kill(). (Usually this is done with the null signal; this can be done reliably with waitpid().)" So, kill() may return success for terminated but not yet reaped processes. I did not know that. But this does not invalidate your change, does it, if all you want to do is to force one consistent view. At least I did not find any code relying on isAlive returning false for not-yet-reaped processes. Thanks, Thomas > > On 7/18/2017 2:53 PM, Thomas Stüfe wrote: > > Hi Roger, > > I think this may fail if you have no permission to send a signal to that > process. In that case, kill(2) may yield EPERM and isAlive may return false > even though the process is alive. > > But then, I am not sure if that could happen in that particular scenario, > plus it may also mean that you do not have access to /proc/pid either. So, > I do not know how much of an issue this could be. > > Otherwise, the fix seems straightforward. > > Kind Regards, Thomas > > On Tue, Jul 18, 2017 at 8:46 PM, Roger Riggs <[email protected]> > wrote: > >> Please review a fix for an intermittent failure in the ProcessHandle >> OnExitTest >> that fails frequently on Solaris. >> >> ProcessHandle.isAlive is using /proc/pid/psinfo to determine if a process >> is alive and it's start time. >> However, it appears that the between the process exiting and the reaping >> of its status, the >> psinfo file indicates the process is alive but kill(pid, 0) reports that >> is is not alive. >> Depending on a race, the ProcessHandler.onExit may determine the process >> has exited >> but later isAlive may report it is alive. >> >> To have a consistent view of the process being alive, >> ProcessHandle.isAlive in its native implementation >> should use kill(pid, 0) to determine if the process is definitively >> determine if the process alive. >> >> The original issue[1] will be kept open until it is known that it is >> resolved. >> >> Webrev: >> http://cr.openjdk.java.net/~rriggs/webrev-alive-solaris-8184808/ >> >> Issue: >> https://bugs.openjdk.java.net/browse/JDK-8184808 >> >> Thanks, Roger >> >> [1] https://bugs.openjdk.java.net/browse/JDK-8177932 >> >> >> > >
