Re: RFR 9: 8077350 Process API Updates Implementation Review

Roger Riggs Fri, 17 Apr 2015 10:07:38 -0700

Hi Thomas,

On 4/16/2015 3:01 PM, Thomas Stüfe wrote:

Hi Roger,
thank you for your answer!
The reason I take an interest is not just theoretical. We (SAP) useour JVM for our test infrastructure and we had exactly the problemallChildren() is designed to solve: killing a process tree related toa specific tests (similar to jtreg tests) in case of errors or hangs.We have test machines running large workloads of tests in parallel andwe reach pid wraparound - depending on the OS - quite fast.
We solved this by adding process groups to Process.java and we arevery happy with this solution. We are able to quickly kill a wholeprocess tree, cleanly and completely, without ambiguity or risk toother tests. Of course we had to add this support as a "sideways hack"in order to not change the official Process.java interface. ThereforeI was hoping that with JEP 102, we would get official support forprocess groups. Unfortunately, seems the decision is already done andwe are too late in the discussion :(

It would be interesting to see a description of what you added to/aroundthe API.The reason to avoid them was one of simplicity and non-interference withprocessesspawned by native libraries. If that complexity can be understoodprocess groups/jobs

could fulfill a need in a scalable system.

At this point, I'd like to deal with it as a separate request forenhancement.

see my other comments inline.
On Sat, Apr 11, 2015 at 8:55 PM, Roger Riggs <roger.ri...@oracle.com<mailto:roger.ri...@oracle.com>> wrote:
    Hi Thomas,

    Thanks for the comments.

    On 4/11/2015 8:31 AM, Thomas Stüfe wrote:
    Hi Roger,

    I have a question about getChildren() and getAllChildren().

    I assume the point of those functions is to implement point 4 of
    JEP 102 ("The ability to deal with process trees, in particular
    some means to destroy a process tree."), by returning a
    collection of PIDs which are the children of the process and then
    killing them?
    Earlier versions included a killProcess tree method but it was
    recommended to leave
    the exact algorithm to kill processes to the caller.
    However, I am not sure that this can be implemented in a safe
    way, at least on UNIX, because - as Martin already pointed out -
    of PID recycling. I do not see how you can prevent allChildren()
    from returning PIDs which may be already reaped and recyled when
    you use them later. How do you prevent that?
    Unless there is an extended time between getting the children and
    destroying them the pids will still be valid.
Why? Child process may be getting reaped the instant you are donereading it from /proc, and pid may have been recycled by the OS rightaway and already pointing to another process when allChildren()returns. If a process lives about as long as it takes the system toreach a pid wraparound to the same pid value, its pid could berecycled right after it is reaped, or? Sure, the longer you wait, thehigher the chance of this to happen, but it may happen right away.
As Martin said, we had those races in the kill() code since a longtime, but children()/allChildren() could make those error moreprobable, because now more processes are involved. Especially if youuse allChildren to kill a deep process tree. And there is nothing inthe javadoc warning the user about this scenario. You would justhappen from time to time to kill an unrelated process. Those problemsare hard to debug.
    The technique of caching the start time can prevent that case;
    though it has AFAIK not been a problem.
How would that work? User should, before issuing the kill, comparestart time of process to kill with cached start time?

See Peter's email, he described it more thoroughly that I have inprevious emails.

    Note even if your coding is bulletproof, that allChildren() will
    also return PIDs of sub processes which are completely unrelated
    to you and Process.java - they could have been forked by some
    third party native code which just happens to run in parallel in
    the same process. There, you have no control about when it gets
    reaped. It might already have been reaped by the time
    allChildren() returns, and now the same PID got recycled as
    another, unrelated process.

    Of course, the best case is for an application to spawn and manage
    its own processes
    and handle there proper termination.
    The use cases for children/allChildren are focused on
    supervisory/executive functions
    that monitor a running system and can cleanup even in the case of
    unexpected failures.

    All management of processes is subject to OS limitations, if the
    PID were from a completely
    different process tree, the ordinary destroy/info functions would
    not be available
    unless the process was running as a privileged os user (same as
    any other native application).

Could you explain this please? If both trees run under the same user,why should I not be able to kill a process from a different tree?

I was considering the case of a different user; only the OS accesscontrols apply

so if it was the same user the processes could be controlled.
The PH API does not provide more or less access than the OS.

Thanks, Roger

    If I am right, it would not be sufficient to state "There is no
    guarantee that a process is alive." - it may be alive but by now
    be a different process altogether. This makes "allChildren()"
    useless for many cases, because the returned information may
    already be obsolete the moment the function returns.

    The caching of startTime can remove the ambiguity.


    Of course I may something missing here?

    But if I got all that right and the sole purpose of allChildren()
    is to be able to kill them (or otherwise signal them), why not
    use process groups? Process groups would be the traditional way
    on POSIX platforms to handle process trees, and they are also
    available on Windows in the form of Job Objects.

    Using process groups to signal sub process trees would be safe,
    would not rely on PID identity, and would be more efficient. Also
    way less coding. Also, it would be an old, established pattern -
    process groups have been around for a long time. Also, using
    process groups it is possible to break away from a group, so a
    program below you which wants to run as a demon can do so by
    removing itself from the process group and thus escaping your kill.

    On Windows we have Job objects, and I think there are enough
    similarities to POSIX process groups to abstract them into
    something platform independent.

    Earlier discussions of process termination and exit value reaping
    considered
    using process groups but it became evident that the Java runtime
    needed to
    be very careful to not interfere with processes that might be
    spawned and
    controlled by native libraries and that process groups would only
    increase
    complexity and the interactions.


    Thanks, Roger


Thanks! Thomas

Re: RFR 9: 8077350 Process API Updates Implementation Review

Reply via email to