Hi Roger, On Fri, Apr 17, 2015 at 7:05 PM, Roger Riggs <roger.ri...@oracle.com> wrote:
> Hi Thomas, > > On 4/16/2015 3:01 PM, Thomas Stüfe wrote: > > Hi Roger, > > thank you for your answer! > > The reason I take an interest is not just theoretical. We (SAP) use our > JVM for our test infrastructure and we had exactly the problem > allChildren() is designed to solve: killing a process tree related to a > specific tests (similar to jtreg tests) in case of errors or hangs. We have > test machines running large workloads of tests in parallel and we reach pid > wraparound - depending on the OS - quite fast. > > We solved this by adding process groups to Process.java and we are very > happy with this solution. We are able to quickly kill a whole process tree, > cleanly and completely, without ambiguity or risk to other tests. Of course > we had to add this support as a "sideways hack" in order to not change the > official Process.java interface. Therefore I was hoping that with JEP 102, > we would get official support for process groups. Unfortunately, seems the > decision is already done and we are too late in the discussion :( > > It would be interesting to see a description of what you added to/around > the API. > Very simple really, all we did was to add a flag to Runtime.exec - ultimately exposed via ProcessBuilder - to make the child process leader of a new process group. This flag just triggered a setpgid() call between fork() and exec() in the child process. This caused creation of a new process group with child process as leader. Now you could kill the whole tree with kill(-pid). On Windows we implemented it with Jobs. It was all simple because we did never aim to bring process groups with all their features to the JDK, we just needed a way to kill a tree of child processes, which is a rather specific problem. The reason to avoid them was one of simplicity and non-interference with > processes > spawned by native libraries. > See, that I don't understand, you still interfere with them by returning all child pids - be they spawned by java or by native libs. Or do you mean you offload responsibility to the caller - so he should decide whether to kill the child pids indiscriminately or be more careful? > If that complexity can be understood process groups/jobs > could fulfill a need in a scalable system. > > I think process groups could be added to the API if they are well documented (which admittedly will be difficult in a platform-neutral way). Basically, process groups are a tool like all others, and the caller must think before using it like with every other tool. > At this point, I'd like to deal with it as a separate request for > enhancement. > Sure! Thanks for listening. Kind Regards, Thoams > > > > see my other comments inline. > > On Sat, Apr 11, 2015 at 8:55 PM, Roger Riggs <roger.ri...@oracle.com> > wrote: > >> Hi Thomas, >> >> Thanks for the comments. >> >> On 4/11/2015 8:31 AM, Thomas Stüfe wrote: >> >> Hi Roger, >> >> I have a question about getChildren() and getAllChildren(). >> >> I assume the point of those functions is to implement point 4 of JEP 102 >> ("The ability to deal with process trees, in particular some means to >> destroy a process tree."), by returning a collection of PIDs which are the >> children of the process and then killing them? >> >> Earlier versions included a killProcess tree method but it was >> recommended to leave >> the exact algorithm to kill processes to the caller. >> >> >> However, I am not sure that this can be implemented in a safe way, at >> least on UNIX, because - as Martin already pointed out - of PID recycling. >> I do not see how you can prevent allChildren() from returning PIDs which >> may be already reaped and recyled when you use them later. How do you >> prevent that? >> >> Unless there is an extended time between getting the children and >> destroying them the pids will still be valid. >> > > Why? Child process may be getting reaped the instant you are done > reading it from /proc, and pid may have been recycled by the OS right away > and already pointing to another process when allChildren() returns. If a > process lives about as long as it takes the system to reach a pid > wraparound to the same pid value, its pid could be recycled right after it > is reaped, or? Sure, the longer you wait, the higher the chance of this to > happen, but it may happen right away. > > As Martin said, we had those races in the kill() code since a long time, > but children()/allChildren() could make those error more probable, because > now more processes are involved. Especially if you use allChildren to kill > a deep process tree. And there is nothing in the javadoc warning the user > about this scenario. You would just happen from time to time to kill an > unrelated process. Those problems are hard to debug. > > The technique of caching the start time can prevent that case; though it >> has AFAIK not been a problem. >> > > How would that work? User should, before issuing the kill, compare start > time of process to kill with cached start time? > > See Peter's email, he described it more thoroughly that I have in previous > emails. > > Note even if your coding is bulletproof, that allChildren() will also >> return PIDs of sub processes which are completely unrelated to you and >> Process.java - they could have been forked by some third party native code >> which just happens to run in parallel in the same process. There, you have >> no control about when it gets reaped. It might already have been reaped by >> the time allChildren() returns, and now the same PID got recycled as >> another, unrelated process. >> >> Of course, the best case is for an application to spawn and manage its >> own processes >> and handle there proper termination. >> The use cases for children/allChildren are focused on >> supervisory/executive functions >> that monitor a running system and can cleanup even in the case of >> unexpected failures. >> > All management of processes is subject to OS limitations, if the PID >> were from a completely >> different process tree, the ordinary destroy/info functions would not be >> available >> unless the process was running as a privileged os user (same as any other >> native application). >> > > Could you explain this please? If both trees run under the same user, > why should I not be able to kill a process from a different tree? > > I was considering the case of a different user; only the OS access > controls apply > so if it was the same user the processes could be controlled. > The PH API does not provide more or less access than the OS. > > Thanks, Roger > > > If I am right, it would not be sufficient to state "There is no >> guarantee that a process is alive." - it may be alive but by now be a >> different process altogether. This makes "allChildren()" useless for many >> cases, because the returned information may already be obsolete the moment >> the function returns. >> >> The caching of startTime can remove the ambiguity. >> > >> >> Of course I may something missing here? >> >> But if I got all that right and the sole purpose of allChildren() is to >> be able to kill them (or otherwise signal them), why not use process >> groups? Process groups would be the traditional way on POSIX platforms to >> handle process trees, and they are also available on Windows in the form of >> Job Objects. >> >> Using process groups to signal sub process trees would be safe, would >> not rely on PID identity, and would be more efficient. Also way less >> coding. Also, it would be an old, established pattern - process groups have >> been around for a long time. Also, using process groups it is possible to >> break away from a group, so a program below you which wants to run as a >> demon can do so by removing itself from the process group and thus escaping >> your kill. >> >> On Windows we have Job objects, and I think there are enough >> similarities to POSIX process groups to abstract them into something >> platform independent. >> >> Earlier discussions of process termination and exit value reaping >> considered >> using process groups but it became evident that the Java runtime needed to >> be very careful to not interfere with processes that might be spawned and >> controlled by native libraries and that process groups would only increase >> complexity and the interactions. >> > >> Thanks, Roger >> >> > Thanks! Thomas > > > >