Re: RFR 9: 8078099 : (process) ProcessHandle should uniquely identify processes

Roger Riggs Tue, 07 Jul 2015 06:54:20 -0700

Hi Peter,

On 7/7/2015 7:31 AM, Peter Levart wrote:

Hi Roger,
On 07/06/2015 04:47 PM, Roger Riggs wrote:
Hi Peter,

Thanks for reviewing the new implementation.
The idea to pre-allocate the buffer based on the size of the outputarrays passed for output
will not work as desired.
The same native code entry point is used whether it is a query forall processes orjust the immediate children. In either case, the buffer for sysctlneeds to belarge enough to accommodate all of the OS threads; its not a functionof the
output array size.
Ah, I see now. You optionally filter the entries returned by sysctlbased on passed-in parent pid.

yes, it was expected that the direct children was the most common use case.

Doing the filtering in native reduces the amount ofallocation/reallocation that is needed.

With respect to the buffer allocation on OS X, I might see the pointof a retry
when sysctl returns ENOMEM;
I don't think its particularly worthwhile to add the native code toretry in the case
of ENOMEM. The OSX doc for sysctrl does indicate it tries to round up
to avoid a subsequent failure.
The ProcessHandle API already cautions that the list of processes isdynamicand that processes may started or terminate concurrently with thecall to children/allChildren.So if the sysctl returns ENOMEM indicating not all the processes fit;the onesthat do fit within in the allocated buffer are valid at that point intime.If there are extras that do not fit in the buffer they are likely tobe 'newer'processes and can be considered to have been started 'after' thechildren/allChildren invocation.
If that's the case, then it might be OK to just silently ignore it.But OTOH it would be surprising if some already long running pid wasskipped because of that.

Yes, there is some risk there (without knowing/depending on the OSalgorithm for copying).

What about designing the getProcessPids0 API to always return allprocesses and do the filtering in Java? At least currentimplementations wouldn't become less optimal because of that (youalways retrieve all processes from the OS and just return the filteredlist to Java).

Only that it gathers more information than needed and has to processmore of it.

The parent pids are not always needed.

Some alternatives (I'm sure you have already considered and rejectedbecause of added complexity):
An alternative could be designing the API around returning a singlelong[] that you allocate in native code - with (pid, ppid, stime)placed into array as consecutive triplets: [pid1, ppid1, stime1, pid2,ppid2, stime2, ...], but you would have to deal with allocation andre-allocation of the resulting array in native code which might getcomplicated.

The Java level return type is a Stream of ProcessHandles; with thecurrent parallel arraysthe stream can lazily construct the ProcessHandle instances when theyare requested. The streamis driven by a sequence of indexes into the arrays. The parallel arraysare more straight forwardto index in both native and java code. Initially, I preferred to dothe allocation/reallocation in Java

(and Java doesn't handle multiple returns easily).

Intermixing the values in a native allocated array would work also,though to keep the javacode consistent it would need to reserve space for the parent pid evenif it was not used.In the case of a bad estimate on the number of processes, it adds anextra Java-Native call.

At this point, optimizing it is lower priority than some other issues.


Another alternative: using a private class:

static class ProcEntry {
    long pid, ppid, stime;
    ProcEntry next;
}

...and building a linked-list of ProcEntries in native code...

Yes, but it does a lot more allocation whether or not the object isneeded to be returned

from the Stream in children(), allChildren, etc.

So I propose to change the error checking to consider ENOMEM as anon-exceptional
return and return the processes available.
It should be non-exceptional in any case. I just wonder if it is OKthat this "lack of full-information" is not communicated to the methodcaller and acted upon.

In the worst case, a high rate of processes being created/deleted, anyheuristicfor retries might not be able to keep up, so there would need to be alimit; and at/after

the limit, the same condition would be true.

A simplier approach would be to just double or triple the size reportedfrom sysctl.Though that seems redundant since sysctl asserts it takes potentialchanges into account already.


Thanks, Roger

Regards, Peter
Thanks, Roger



On 7/2/2015 3:32 AM, Peter Levart wrote:
Hi Roger,

I looked at the code briefly and have the following comments:

For ProcessHandleImpl_macosx.c, in method getProcessPids0:

 224     // Get buffer size needed to read all processes
 225     int mib[4] = {CTL_KERN, KERN_PROC, KERN_PROC_ALL, 0};
 226     if (sysctl(mib, 4, NULL, &bufSize, NULL, 0) < 0) {
 227         JNU_ThrowByNameWithLastError(env,
 228             "java/lang/RuntimeException", "sysctl failed");
 229         return -1;
 230     }
 231
 232     // Allocate buffer big enough for all processes
 233     void *buffer = malloc(bufSize);
 234     if (buffer == NULL) {
 235         JNU_ThrowOutOfMemoryError(env, "malloc failed");
 236         return -1;
 237     }
 238
 239     // Read process info for all processes
 240     if (sysctl(mib, 4, buffer, &bufSize, NULL, 0) < 0) {
 241         JNU_ThrowByNameWithLastError(env,
 242             "java/lang/RuntimeException", "sysctl failed");
 243         free(buffer);
 244         return -1;
 245     }
... the 1st call to sysctl is used to measure the size of the neededbuffer and the 2nd call fills-in the buffer. The documentation forsysctl says:
" The information is copied into the buffer specified by oldp.The size of the buffer is given by thelocation specified by oldlenp before the call, and thatlocation gives the amount of data copied aftera successful call and after a call that returns with the errorcode ENOMEM. If the amount of dataavailable is greater than the size of the buffer supplied, thecall supplies as much data as fits inthe buffer provided and returns with the error code ENOMEM. Ifthe old value is not desired, oldp and
     oldlenp should be set to NULL.
The size of the available data can be determined by callingsysctl() with the NULL argument for oldp.The size of the available data will be returned in the locationpointed to by oldlenp. For some opera-tions, operations,tions, the amount of space may change often. For theseoperations, the system attempts to round up sothat the returned size is large enough for a call to return thedata shortly thereafter."
So while not very probable, it can happen that you get ENOMEM from2nd call because of underestimated buffer size. Would it be betterto retry (re)allocation of buffer and 2nd call in a loop with newestimation returned from previous call while the error is ENOMEM?
Another suggestion: What would it be if the buffer size estimationwas not computed by a call to sysctl with NULL buffer, but was takenfrom the passed-in resulting array size(s). In case the userpasses-in arrays of sufficient size, you can avoid double invocationof sysctl. Also, if ENOMEM happens, you can just return the resultobtained and the new estimation - no looping in native code. TheUNIX, Solaris and Windows variants look good.
Regards, Peter

On 06/22/2015 05:10 PM, Roger Riggs wrote:
Please review changes to ProcessHandle implementation to uniquelyidentifyprocesses based on the start time provided by the OS. It addressesthe issue
of PID reuse.
This is the implementation of the ProcessHandle.equals() specchange in8129344 : (process) ProcessHandle instances should defineequals and be value-based
Webrev:
http://cr.openjdk.java.net/~rriggs/webrev-starttime-8078099/

Issue:
https://bugs.openjdk.java.net/browse/JDK-8078099

Thanks, Roger

Re: RFR 9: 8078099 : (process) ProcessHandle should uniquely identify processes

Reply via email to