On Fri, Apr 17, 2015 at 8:40 AM, Staffan Larsen <staffan.lar...@oracle.com>
wrote:

>
> On 16 apr 2015, at 21:01, Thomas Stüfe <thomas.stu...@gmail.com> wrote:
>
> Hi Roger,
>
> thank you for your answer!
>
> The reason I take an interest is not just theoretical. We (SAP) use our JVM
> for our test infrastructure and we had exactly the problem allChildren() is
> designed to solve: killing a process tree related to a specific tests
> (similar to jtreg tests) in case of errors or hangs. We have test machines
> running large workloads of tests in parallel and we reach pid wraparound -
> depending on the OS - quite fast.
>
> We solved this by adding process groups to Process.java and we are very
> happy with this solution. We are able to quickly kill a whole process tree,
> cleanly and completely, without ambiguity or risk to other tests. Of course
> we had to add this support as a "sideways hack" in order to not change the
> official Process.java interface. Therefore I was hoping that with JEP 102,
> we would get official support for process groups. Unfortunately, seems the
> decision is already done and we are too late in the discussion :(
>
>
> Interestingly we are hoping to use allChildren() to kill process trees in
> jtreg - exactly the use case you are describing. I haven’t been testing the
> current approach in allChildren(), but it seems your experience indicates
> that it will not be a perfect fit for the use case. In a previous test
> framework I was involved in we also used process groups for this with good
> results. This does beg the question: if the current approach isn’t useful
> for our own testing purposes, when is it useful?
>
>
Monitoring, I guess. Like writing your own pstree. But not for anything
requiring you to send signals to those pids.

..Thomas


> Thanks,
> /Staffan
>
>
> see my other comments inline.
>
> On Sat, Apr 11, 2015 at 8:55 PM, Roger Riggs <roger.ri...@oracle.com>
> wrote:
>
> Hi Thomas,
>
> Thanks for the comments.
>
> On 4/11/2015 8:31 AM, Thomas Stüfe wrote:
>
> Hi Roger,
>
> I have a question about getChildren() and getAllChildren().
>
> I assume the point of those functions is to implement point 4 of JEP 102
> ("The ability to deal with process trees, in particular some means to
> destroy a process tree."), by returning a collection of PIDs which are the
> children of the process and then killing them?
>
> Earlier versions included a killProcess tree method but it was recommended
> to leave
> the exact algorithm to kill processes to the caller.
>
>
> However, I am not sure that this can be implemented in a safe way, at
> least on UNIX, because - as Martin already pointed out - of PID recycling.
> I do not see how you can prevent allChildren() from returning PIDs which
> may be already reaped and recyled when you use them later. How do you
> prevent that?
>
> Unless there is an extended time between getting the children and
> destroying them the pids will still be valid.
>
>
> Why? Child process may be getting reaped the instant you are done reading
> it from /proc, and pid may have been recycled by the OS right away and
> already pointing to another process when allChildren() returns. If a
> process lives about as long as it takes the system to reach a pid
> wraparound to the same pid value, its pid could be recycled right after it
> is reaped, or? Sure, the longer you wait, the higher the chance of this to
> happen, but it may happen right away.
>
> As Martin said, we had those races in the kill() code since a long time,
> but children()/allChildren() could make those error more probable, because
> now more processes are involved. Especially if you use allChildren to kill
> a deep process tree. And there is nothing in the javadoc warning the user
> about this scenario. You would just happen from time to time to kill an
> unrelated process. Those problems are hard to debug.
>
> The technique of caching the start time can prevent that case; though it
>
> has AFAIK not been a problem.
>
>
> How would that work? User should, before issuing the kill, compare start
> time of process to kill with cached start time?
>
> Note even if your coding is bulletproof, that allChildren() will also
> return PIDs of sub processes which are completely unrelated to you and
> Process.java - they could have been forked by some third party native code
> which just happens to run in parallel in the same process. There, you have
> no control about when it gets reaped. It might already have been reaped by
> the time allChildren() returns, and now the same PID got recycled as
> another, unrelated process.
>
> Of course, the best case is for an application to spawn and manage its own
> processes
> and handle there proper termination.
> The use cases for children/allChildren are focused on
> supervisory/executive functions
> that monitor a running system and can cleanup even in the case of
> unexpected failures.
>
> All management of processes is subject to OS limitations, if the PID were
>
> from a completely
> different process tree, the ordinary destroy/info functions would not be
> available
> unless the process was running as a privileged os user (same as any other
> native application).
>
>
> Could you explain this please? If both trees run under the same user, why
> should I not be able to kill a process from a different tree?
>
> If I am right, it would not be sufficient to state "There is no guarantee
> that a process is alive." - it may be alive but by now be a different
> process altogether. This makes "allChildren()" useless for many cases,
> because the returned information may already be obsolete the moment the
> function returns.
>
> The caching of startTime can remove the ambiguity.
>
>
>
> Of course I may something missing here?
>
> But if I got all that right and the sole purpose of allChildren() is to
> be able to kill them (or otherwise signal them), why not use process
> groups? Process groups would be the traditional way on POSIX platforms to
> handle process trees, and they are also available on Windows in the form of
> Job Objects.
>
> Using process groups to signal sub process trees would be safe, would
> not rely on PID identity, and would be more efficient. Also way less
> coding. Also, it would be an old, established pattern - process groups have
> been around for a long time. Also, using process groups it is possible to
> break away from a group, so a program below you which wants to run as a
> demon can do so by removing itself from the process group and thus escaping
> your kill.
>
> On Windows we have Job objects, and I think there are enough
> similarities to POSIX process groups to abstract them into something
> platform independent.
>
> Earlier discussions of process termination and exit value reaping
> considered
> using process groups but it became evident that the Java runtime needed to
> be very careful to not interfere with processes that might be spawned and
> controlled by native libraries and that process groups would only increase
> complexity and the interactions.
>
>
> Thanks, Roger
>
>
> Thanks! Thomas
>
>
>

Reply via email to