On Fri, Apr 17, 2015 at 8:40 AM, Staffan Larsen <staffan.lar...@oracle.com> wrote:
> > On 16 apr 2015, at 21:01, Thomas Stüfe <thomas.stu...@gmail.com> wrote: > > Hi Roger, > > thank you for your answer! > > The reason I take an interest is not just theoretical. We (SAP) use our JVM > for our test infrastructure and we had exactly the problem allChildren() is > designed to solve: killing a process tree related to a specific tests > (similar to jtreg tests) in case of errors or hangs. We have test machines > running large workloads of tests in parallel and we reach pid wraparound - > depending on the OS - quite fast. > > We solved this by adding process groups to Process.java and we are very > happy with this solution. We are able to quickly kill a whole process tree, > cleanly and completely, without ambiguity or risk to other tests. Of course > we had to add this support as a "sideways hack" in order to not change the > official Process.java interface. Therefore I was hoping that with JEP 102, > we would get official support for process groups. Unfortunately, seems the > decision is already done and we are too late in the discussion :( > > > Interestingly we are hoping to use allChildren() to kill process trees in > jtreg - exactly the use case you are describing. I haven’t been testing the > current approach in allChildren(), but it seems your experience indicates > that it will not be a perfect fit for the use case. In a previous test > framework I was involved in we also used process groups for this with good > results. This does beg the question: if the current approach isn’t useful > for our own testing purposes, when is it useful? > > Monitoring, I guess. Like writing your own pstree. But not for anything requiring you to send signals to those pids. ..Thomas > Thanks, > /Staffan > > > see my other comments inline. > > On Sat, Apr 11, 2015 at 8:55 PM, Roger Riggs <roger.ri...@oracle.com> > wrote: > > Hi Thomas, > > Thanks for the comments. > > On 4/11/2015 8:31 AM, Thomas Stüfe wrote: > > Hi Roger, > > I have a question about getChildren() and getAllChildren(). > > I assume the point of those functions is to implement point 4 of JEP 102 > ("The ability to deal with process trees, in particular some means to > destroy a process tree."), by returning a collection of PIDs which are the > children of the process and then killing them? > > Earlier versions included a killProcess tree method but it was recommended > to leave > the exact algorithm to kill processes to the caller. > > > However, I am not sure that this can be implemented in a safe way, at > least on UNIX, because - as Martin already pointed out - of PID recycling. > I do not see how you can prevent allChildren() from returning PIDs which > may be already reaped and recyled when you use them later. How do you > prevent that? > > Unless there is an extended time between getting the children and > destroying them the pids will still be valid. > > > Why? Child process may be getting reaped the instant you are done reading > it from /proc, and pid may have been recycled by the OS right away and > already pointing to another process when allChildren() returns. If a > process lives about as long as it takes the system to reach a pid > wraparound to the same pid value, its pid could be recycled right after it > is reaped, or? Sure, the longer you wait, the higher the chance of this to > happen, but it may happen right away. > > As Martin said, we had those races in the kill() code since a long time, > but children()/allChildren() could make those error more probable, because > now more processes are involved. Especially if you use allChildren to kill > a deep process tree. And there is nothing in the javadoc warning the user > about this scenario. You would just happen from time to time to kill an > unrelated process. Those problems are hard to debug. > > The technique of caching the start time can prevent that case; though it > > has AFAIK not been a problem. > > > How would that work? User should, before issuing the kill, compare start > time of process to kill with cached start time? > > Note even if your coding is bulletproof, that allChildren() will also > return PIDs of sub processes which are completely unrelated to you and > Process.java - they could have been forked by some third party native code > which just happens to run in parallel in the same process. There, you have > no control about when it gets reaped. It might already have been reaped by > the time allChildren() returns, and now the same PID got recycled as > another, unrelated process. > > Of course, the best case is for an application to spawn and manage its own > processes > and handle there proper termination. > The use cases for children/allChildren are focused on > supervisory/executive functions > that monitor a running system and can cleanup even in the case of > unexpected failures. > > All management of processes is subject to OS limitations, if the PID were > > from a completely > different process tree, the ordinary destroy/info functions would not be > available > unless the process was running as a privileged os user (same as any other > native application). > > > Could you explain this please? If both trees run under the same user, why > should I not be able to kill a process from a different tree? > > If I am right, it would not be sufficient to state "There is no guarantee > that a process is alive." - it may be alive but by now be a different > process altogether. This makes "allChildren()" useless for many cases, > because the returned information may already be obsolete the moment the > function returns. > > The caching of startTime can remove the ambiguity. > > > > Of course I may something missing here? > > But if I got all that right and the sole purpose of allChildren() is to > be able to kill them (or otherwise signal them), why not use process > groups? Process groups would be the traditional way on POSIX platforms to > handle process trees, and they are also available on Windows in the form of > Job Objects. > > Using process groups to signal sub process trees would be safe, would > not rely on PID identity, and would be more efficient. Also way less > coding. Also, it would be an old, established pattern - process groups have > been around for a long time. Also, using process groups it is possible to > break away from a group, so a program below you which wants to run as a > demon can do so by removing itself from the process group and thus escaping > your kill. > > On Windows we have Job objects, and I think there are enough > similarities to POSIX process groups to abstract them into something > platform independent. > > Earlier discussions of process termination and exit value reaping > considered > using process groups but it became evident that the Java runtime needed to > be very careful to not interfere with processes that might be spawned and > controlled by native libraries and that process groups would only increase > complexity and the interactions. > > > Thanks, Roger > > > Thanks! Thomas > > >