Hi Peter,
On 7/7/2015 7:31 AM, Peter Levart wrote:
Hi Roger,
On 07/06/2015 04:47 PM, Roger Riggs wrote:
Hi Peter,
Thanks for reviewing the new implementation.
The idea to pre-allocate the buffer based on the size of the output
arrays passed for output
will not work as desired.
The same native code entry point is used whether it is a query for
all processes or
just the immediate children. In either case, the buffer for sysctl
needs to be
large enough to accommodate all of the OS threads; its not a function
of the
output array size.
Ah, I see now. You optionally filter the entries returned by sysctl
based on passed-in parent pid.
yes, it was expected that the direct children was the most common use case.
Doing the filtering in native reduces the amount of
allocation/reallocation that is needed.
With respect to the buffer allocation on OS X, I might see the point
of a retry
when sysctl returns ENOMEM;
I don't think its particularly worthwhile to add the native code to
retry in the case
of ENOMEM. The OSX doc for sysctrl does indicate it tries to round up
to avoid a subsequent failure.
The ProcessHandle API already cautions that the list of processes is
dynamic
and that processes may started or terminate concurrently with the
call to children/allChildren.
So if the sysctl returns ENOMEM indicating not all the processes fit;
the ones
that do fit within in the allocated buffer are valid at that point in
time.
If there are extras that do not fit in the buffer they are likely to
be 'newer'
processes and can be considered to have been started 'after' the
children/allChildren invocation.
If that's the case, then it might be OK to just silently ignore it.
But OTOH it would be surprising if some already long running pid was
skipped because of that.
Yes, there is some risk there (without knowing/depending on the OS
algorithm for copying).
What about designing the getProcessPids0 API to always return all
processes and do the filtering in Java? At least current
implementations wouldn't become less optimal because of that (you
always retrieve all processes from the OS and just return the filtered
list to Java).
Only that it gathers more information than needed and has to process
more of it.
The parent pids are not always needed.
Some alternatives (I'm sure you have already considered and rejected
because of added complexity):
An alternative could be designing the API around returning a single
long[] that you allocate in native code - with (pid, ppid, stime)
placed into array as consecutive triplets: [pid1, ppid1, stime1, pid2,
ppid2, stime2, ...], but you would have to deal with allocation and
re-allocation of the resulting array in native code which might get
complicated.
The Java level return type is a Stream of ProcessHandles; with the
current parallel arrays
the stream can lazily construct the ProcessHandle instances when they
are requested. The stream
is driven by a sequence of indexes into the arrays. The parallel arrays
are more straight forward
to index in both native and java code. Initially, I preferred to do
the allocation/reallocation in Java
(and Java doesn't handle multiple returns easily).
Intermixing the values in a native allocated array would work also,
though to keep the java
code consistent it would need to reserve space for the parent pid even
if it was not used.
In the case of a bad estimate on the number of processes, it adds an
extra Java-Native call.
At this point, optimizing it is lower priority than some other issues.
Another alternative: using a private class:
static class ProcEntry {
long pid, ppid, stime;
ProcEntry next;
}
...and building a linked-list of ProcEntries in native code...
Yes, but it does a lot more allocation whether or not the object is
needed to be returned
from the Stream in children(), allChildren, etc.
So I propose to change the error checking to consider ENOMEM as a
non-exceptional
return and return the processes available.
It should be non-exceptional in any case. I just wonder if it is OK
that this "lack of full-information" is not communicated to the method
caller and acted upon.
In the worst case, a high rate of processes being created/deleted, any
heuristic
for retries might not be able to keep up, so there would need to be a
limit; and at/after
the limit, the same condition would be true.
A simplier approach would be to just double or triple the size reported
from sysctl.
Though that seems redundant since sysctl asserts it takes potential
changes into account already.
Thanks, Roger
Regards, Peter
Thanks, Roger
On 7/2/2015 3:32 AM, Peter Levart wrote:
Hi Roger,
I looked at the code briefly and have the following comments:
For ProcessHandleImpl_macosx.c, in method getProcessPids0:
224 // Get buffer size needed to read all processes
225 int mib[4] = {CTL_KERN, KERN_PROC, KERN_PROC_ALL, 0};
226 if (sysctl(mib, 4, NULL, &bufSize, NULL, 0) < 0) {
227 JNU_ThrowByNameWithLastError(env,
228 "java/lang/RuntimeException", "sysctl failed");
229 return -1;
230 }
231
232 // Allocate buffer big enough for all processes
233 void *buffer = malloc(bufSize);
234 if (buffer == NULL) {
235 JNU_ThrowOutOfMemoryError(env, "malloc failed");
236 return -1;
237 }
238
239 // Read process info for all processes
240 if (sysctl(mib, 4, buffer, &bufSize, NULL, 0) < 0) {
241 JNU_ThrowByNameWithLastError(env,
242 "java/lang/RuntimeException", "sysctl failed");
243 free(buffer);
244 return -1;
245 }
... the 1st call to sysctl is used to measure the size of the needed
buffer and the 2nd call fills-in the buffer. The documentation for
sysctl says:
" The information is copied into the buffer specified by oldp.
The size of the buffer is given by the
location specified by oldlenp before the call, and that
location gives the amount of data copied after
a successful call and after a call that returns with the error
code ENOMEM. If the amount of data
available is greater than the size of the buffer supplied, the
call supplies as much data as fits in
the buffer provided and returns with the error code ENOMEM. If
the old value is not desired, oldp and
oldlenp should be set to NULL.
The size of the available data can be determined by calling
sysctl() with the NULL argument for oldp.
The size of the available data will be returned in the location
pointed to by oldlenp. For some opera-tions, operations,
tions, the amount of space may change often. For these
operations, the system attempts to round up so
that the returned size is large enough for a call to return the
data shortly thereafter."
So while not very probable, it can happen that you get ENOMEM from
2nd call because of underestimated buffer size. Would it be better
to retry (re)allocation of buffer and 2nd call in a loop with new
estimation returned from previous call while the error is ENOMEM?
Another suggestion: What would it be if the buffer size estimation
was not computed by a call to sysctl with NULL buffer, but was taken
from the passed-in resulting array size(s). In case the user
passes-in arrays of sufficient size, you can avoid double invocation
of sysctl. Also, if ENOMEM happens, you can just return the result
obtained and the new estimation - no looping in native code. The
UNIX, Solaris and Windows variants look good.
Regards, Peter
On 06/22/2015 05:10 PM, Roger Riggs wrote:
Please review changes to ProcessHandle implementation to uniquely
identify
processes based on the start time provided by the OS. It addresses
the issue
of PID reuse.
This is the implementation of the ProcessHandle.equals() spec
change in
8129344 : (process) ProcessHandle instances should define
equals and be value-based
Webrev:
http://cr.openjdk.java.net/~rriggs/webrev-starttime-8078099/
Issue:
https://bugs.openjdk.java.net/browse/JDK-8078099
Thanks, Roger