Hi Thomas,
On 4/16/2015 3:01 PM, Thomas Stüfe wrote:
Hi Roger,
thank you for your answer!
The reason I take an interest is not just theoretical. We (SAP) use
our JVM for our test infrastructure and we had exactly the problem
allChildren() is designed to solve: killing a process tree related to
a specific tests (similar to jtreg tests) in case of errors or hangs.
We have test machines running large workloads of tests in parallel and
we reach pid wraparound - depending on the OS - quite fast.
We solved this by adding process groups to Process.java and we are
very happy with this solution. We are able to quickly kill a whole
process tree, cleanly and completely, without ambiguity or risk to
other tests. Of course we had to add this support as a "sideways hack"
in order to not change the official Process.java interface. Therefore
I was hoping that with JEP 102, we would get official support for
process groups. Unfortunately, seems the decision is already done and
we are too late in the discussion :(
It would be interesting to see a description of what you added to/around
the API.
The reason to avoid them was one of simplicity and non-interference with
processes
spawned by native libraries. If that complexity can be understood
process groups/jobs
could fulfill a need in a scalable system.
At this point, I'd like to deal with it as a separate request for
enhancement.
see my other comments inline.
On Sat, Apr 11, 2015 at 8:55 PM, Roger Riggs <roger.ri...@oracle.com
<mailto:roger.ri...@oracle.com>> wrote:
Hi Thomas,
Thanks for the comments.
On 4/11/2015 8:31 AM, Thomas Stüfe wrote:
Hi Roger,
I have a question about getChildren() and getAllChildren().
I assume the point of those functions is to implement point 4 of
JEP 102 ("The ability to deal with process trees, in particular
some means to destroy a process tree."), by returning a
collection of PIDs which are the children of the process and then
killing them?
Earlier versions included a killProcess tree method but it was
recommended to leave
the exact algorithm to kill processes to the caller.
However, I am not sure that this can be implemented in a safe
way, at least on UNIX, because - as Martin already pointed out -
of PID recycling. I do not see how you can prevent allChildren()
from returning PIDs which may be already reaped and recyled when
you use them later. How do you prevent that?
Unless there is an extended time between getting the children and
destroying them the pids will still be valid.
Why? Child process may be getting reaped the instant you are done
reading it from /proc, and pid may have been recycled by the OS right
away and already pointing to another process when allChildren()
returns. If a process lives about as long as it takes the system to
reach a pid wraparound to the same pid value, its pid could be
recycled right after it is reaped, or? Sure, the longer you wait, the
higher the chance of this to happen, but it may happen right away.
As Martin said, we had those races in the kill() code since a long
time, but children()/allChildren() could make those error more
probable, because now more processes are involved. Especially if you
use allChildren to kill a deep process tree. And there is nothing in
the javadoc warning the user about this scenario. You would just
happen from time to time to kill an unrelated process. Those problems
are hard to debug.
The technique of caching the start time can prevent that case;
though it has AFAIK not been a problem.
How would that work? User should, before issuing the kill, compare
start time of process to kill with cached start time?
See Peter's email, he described it more thoroughly that I have in
previous emails.
Note even if your coding is bulletproof, that allChildren() will
also return PIDs of sub processes which are completely unrelated
to you and Process.java - they could have been forked by some
third party native code which just happens to run in parallel in
the same process. There, you have no control about when it gets
reaped. It might already have been reaped by the time
allChildren() returns, and now the same PID got recycled as
another, unrelated process.
Of course, the best case is for an application to spawn and manage
its own processes
and handle there proper termination.
The use cases for children/allChildren are focused on
supervisory/executive functions
that monitor a running system and can cleanup even in the case of
unexpected failures.
All management of processes is subject to OS limitations, if the
PID were from a completely
different process tree, the ordinary destroy/info functions would
not be available
unless the process was running as a privileged os user (same as
any other native application).
Could you explain this please? If both trees run under the same user,
why should I not be able to kill a process from a different tree?
I was considering the case of a different user; only the OS access
controls apply
so if it was the same user the processes could be controlled.
The PH API does not provide more or less access than the OS.
Thanks, Roger
If I am right, it would not be sufficient to state "There is no
guarantee that a process is alive." - it may be alive but by now
be a different process altogether. This makes "allChildren()"
useless for many cases, because the returned information may
already be obsolete the moment the function returns.
The caching of startTime can remove the ambiguity.
Of course I may something missing here?
But if I got all that right and the sole purpose of allChildren()
is to be able to kill them (or otherwise signal them), why not
use process groups? Process groups would be the traditional way
on POSIX platforms to handle process trees, and they are also
available on Windows in the form of Job Objects.
Using process groups to signal sub process trees would be safe,
would not rely on PID identity, and would be more efficient. Also
way less coding. Also, it would be an old, established pattern -
process groups have been around for a long time. Also, using
process groups it is possible to break away from a group, so a
program below you which wants to run as a demon can do so by
removing itself from the process group and thus escaping your kill.
On Windows we have Job objects, and I think there are enough
similarities to POSIX process groups to abstract them into
something platform independent.
Earlier discussions of process termination and exit value reaping
considered
using process groups but it became evident that the Java runtime
needed to
be very careful to not interfere with processes that might be
spawned and
controlled by native libraries and that process groups would only
increase
complexity and the interactions.
Thanks, Roger
Thanks! Thomas