Hi Roger,
On 04/04/2016 11:50 PM, Roger Riggs wrote:
Hi Peter,
Stepping back just a bit.
Right, let's clear up.
The old Cleaner running on the Reference processing thread had a few
(2) very well controlled
functions, reference processing and deallocating DirectByteBuffers.
Maybe we can't do too much
better than that.
...yes, at the beginning, until it was (re)used for other purposes too,
in: java.lang.ProcessImpl,
java.lang.invoke.MethodHandleNatives.CallSiteContext,
jdk.internal.perf.Perf, sun.nio.ch.IOVecWrapper, sun.nio.fs.NativeBuffer
and sun.java2d.marlin.OffHeapArray.
But those other usages have been converted to use new
java.lang.ref.Cleaner so old cleaner is now back to basics -
DirectByteBuffers. And with that, DirectByteBuffers allocating threads
only help ReferenceHandler thread enqueue References and execute
DirectByteBuffer deallocators, which is an improvement.
But should we keep that status quo? It's nothing wrong with it as it is,
except I think we can do better.
The old worst case performance/latency wise was the reference
processing thread
did the work and the allocating thread did very little synchronizing
and just did the retries.
The number of retries was exactly the same as the number of References
helped to be enqueued or in case of Cleaner(s), executed:
// retry while helping enqueue pending Reference objects
// which includes executing pending Cleaner(s) which includes
// Cleaner(s) that free direct buffer memory
while (jlra.tryHandlePendingReference()) {
if (tryReserveMemory(size, cap)) {
return;
}
}
If the share of pending References that are also Cleaners was high,
chances were higher that not much helping was needed as one cleaned
DBB.Deallocator could unreserve enough memory for next reservation
attempt to succeed. So allocating thread helped only until it succeeded
in reserving the native memory leaving the rest of work to another
allocating request/thread or to ReferenceHandler.
In the best case, all the real work was done by the allocating thread,
if the interactions with GC
work out perfectly. But it was still the case that the buffer
alloc/dealloc throughput was
met with the division of work separating the reference processing
thread and the allocating thread.
Yes, whichever thread was quicker. If ReferenceHandler thread had been
waking up from wait() for a long time, allocating thread could have
already processed all the References before ReferenceHandler finally
started to look around. If there were lots of new pending referenced
discovered, ReferenceHandler thread could finally join the party and
fight for the same lock...
The function that can only be provided by CleanerImpl / Reference
processing thread state is
knowledge that the cleaning queue is empty.
...and that the discovered pending references have actually been
enqueued before that...
The helping functions were/are a bit troublesome because of mixing
execution
environments of the thread allocating direct buffers and the
cleanables and it seemed that
more than a little complexity was needed to compensate.
I totally agree.
If the bottleneck in processing is between the reference processing
and cleanup
then it should be ok (based on previous comments) for the CleanerImpl
to help with
reference processing (after it has an empty queue and before it blocks
waiting or in every loop).
Though if you already tried this combination, I don't recall the results.
I don't thing there is a problem because of any bottleneck. And if there
was a bottleneck we would only have a problem with
allocation/deallocation throughput and not with OOME(s). The problem is
because reference discovery is not triggered as a result of native
memory reservation approaching or reaching the limit. There is no heap
memory pressure from DirectByteBuffer(s) because they are small objects.
So a mechanism must be in place that triggers reference discovery and
waits for discovered references to be processed before failing the
native memory allocation. A mechanism that tries to simulate what
happens with GC when there is heap memory pressure. GC guarantees that
full-GC is executed and heap allocation retried after that before
finally giving up with OOME. We need a mechanism that attempts to do the
same for direct memory. Throughput is a nice property but we are not
directly seeking its improvement. We just not want to make things much
worse.
Helping the Cleaner thread to process cleanup functions is the easiest
way to wait for cleanup functions to be processed and for queue to
drain. Simply because of ReferenceQueue API. If you poll() next
Reference from the queue and get null, you know the queue is empty, but
if you get something, you have to execute it and not just ignore it.
Maybe we could patch into the ReferenceQueue implementation and extend
its API with an internal method that would not return next Reference but
just information that ReferenceHandler thread has done so or that the
queue is empty. I'll think about it.
As you pointed out it would be more efficient if the allocating thread
could be aware
when it was known there was nothing ready to cleanup so it can retry
and invoke GC or
throw out of memory if appropriate.
Adding a method that returned the count of completed cleaning cycles
(or similar)
to CleanerImpl could exist with a minimal of coupling and still provide
the information needed without commingling the execution threads.
I'll think about how to surface this functionality in the CleanerImpl
most elegantly. The functionality of providing only the counter of
cleaning cycles as a getter might not be most appropriate. What we also
need is some mechanism to wait and be woken up to retry reservation only
at appropriate points in time otherwise allocating threads could just
spin eating CPU time. So my latest attempt was to encapsulate the entire
retry logic inside ExtendedCleaner with ByteBuffer/Bits only providing
allocation function to this logic, which in my view of API is pretty
decoupled and general.
I don't see the need to change Cleaner to an interface to be able to
provide
an additional method on CleanerImpl or a subclass and a factory method
could
provide for a clean and very targeted interface to Bits/Direct buffer.
I would like this to be an instance method so it would naturally pertain
to a particular Cleaner instance. Or it could be a static method that
takes a Cleaner instance. One of my previous webrevs did have such
method on the CleanerImpl, but I was advised to move it to Cleaner as a
package-private method and expose it via SharedSecrets to internal code.
I feel such "camouflage" is very awkward now that we have modules and
other mechanisms exist. So I thought it would be most elegant to make
Cleaner an interface so it can be extended with an internal interface to
communicate intent in a type-safe and auto-discoverable way. The change
to make it interface:
http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.part2.1.rev01/
...actually simplifies implementation (33 lines removed in total) and
could be seen as an improvement in itself.
Are you afraid that if Cleaner was an interface, others would attempt to
make implementations of it? Now that we have default methods on
interfaces it is easy to compatibly extend the API even if it is an
interface so that no 3rd party implementations are immediately broken.
Are you thinking of security implications when some code is handed a
Cleaner instance that it doesn't trust? I don't think there is a utility
for Cleaner instances to be passed from untrusted to trusted code, do you?
In the end it doesn't really matter. We can do it one way or the other.
I just feel that using an interface is cleaner.
I'm sorry I haven't had time to try out concretely what I have in mind.
Please correct or remind me of missing salient considerations.
The bottom line is that we need a mechanism that:
- triggers reference discovery when native memory limit is approached or
reached
- retires native memory reservation at appropriate time slots until
succeeding or until all pending references have been processed and
Cleanables executed at which time native memory reservation can fail
with OOME.
- if possible, doesn't execute cleanup functions by the allocating
thread but just waits for system threads to do the job.
- when triggered, does not make native memory allocation a bottleneck.
I think that what I did in my latest webrevs with ReferenceHandler
thread is an improvement in minimizing contended synchronization and
interference of allocating thread(s) with Reference enqueue-ing. But
interaction of allocating thread(s) with Cleaner background thread could
be improved and I have a couple of ideas to explore.
Thanks, Roger
Regards, Peter
On 4/2/2016 7:24 AM, Peter Levart wrote:
Hi Roger,
Thanks for looking at the patch.
On 04/02/2016 01:31 AM, Roger Riggs wrote:
Hi Peter,
I overlooked the introduction of another nested class (Task) to
handle the cleanup.
But there are too many changes to see which ones solve a single
problem.
Sorry to make more work, but I think we need to go back to the
minimum necessary
change to make progress on this. Omit all of the little cleanups
until the end
or do them first and separately.
Thanks, Roger
No Problem. I understand. So let's proceed in stages. Since part1 is
already pushed, I'll call part2 stages with names: part2.1, part2.2,
... and I'll start counting webrev revisions from 01 again, so webrev
names will be in the form: webrev.part2.1.rev01. Each part will be an
incremental change to the previous one.
part2.1: This is preparation work to be able to have an extended
java.lang.ref.Cleaner type for internal use. Since
java.lang.ref.Cleaner is a final class, I propose to make it an
interface instead:
http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.part2.1.rev01/
This is a source-compatible change and it also simplifies
implementation (no injection of Cleaner.impl access function into
CleanerImpl needed any more). What used to be java.lang.ref.Cleaner
is renamed to jdk.internal.ref.CleanerImpl. What used to be
jdk.internal.ref.CleanerImpl is now a nested static class
jdk.internal.ref.CleanerImpl.Task (because it implements Runnable).
Otherwise nothing has changed in the overall architecture of the
Cleaner except that public-facing API is now an interface instead of
a final class. This allows specifying internal extension interface
and internal extension implementation.
CleanerTest passes with this change.
So what do you think?
Regards, Peter
On 4/1/16 5:51 PM, Roger Riggs wrote:
Hi Peter,
Thanks for the diffs to look at.
Two observations on the changes.
- The Cleaner instance was intentionally and necessarily different
than the CleanerImpl to enable
the CleanerImpl and its thread to terminate if the Cleaner is not
longer referenced.
Folding them into a single object breaks that.
Perhaps it is not too bad for ExtendedCleaner to subclass
CleanerImpl with the cleanup helper/supplier behavior
and expose itself to Bits. There will be fewer moving parts. There
is no need for two factory methods for
ExtendedCleaner unless you are going to use a separate ThreadFactory.
- The Deallocator (and now Allocator) nested classes are identical,
and there is a separate copy for each
type derived from the Direct-X-template. But it may not be worth
fixing until the rest of it is settled to avoid
more moving parts.
I don't have an opinion on the code changes in Reference, that's
different kettle of fish.
More next week.
Have a good weekend, Roger
On 4/1/2016 12:46 PM, Peter Levart wrote:
On 04/01/2016 06:08 PM, Peter Levart wrote:
On 04/01/2016 05:18 PM, Peter Levart wrote:
@Roger:
...
About entanglement between nio Bits and
ExtendedCleaner.retryWhileHelpingClean(). It is the same level
of entanglement as between the DirectByteBuffer constructor and
Cleaner.register(). In both occasions an action is provided to
the Cleaner. Cleaner.register() takes a cleanup action and
ExtendedCleaner.retryWhileHelpingClean() takes a retriable
"allocating" or "reservation" action. "allocation" or
"reservation" is the opposite of cleanup. Both methods are
encapsulated in the same object because those two functions must
be coordinated. So I think that collocating them together makes
sense. What do you think?
...to illustrate what I mean, here's a variant that totally
untangles Bits from Cleaner and moves the whole Cleaner
interaction into the DirectByteBuffer itself:
http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.13.part2/
Notice the symmetry between Cleaner.retryWhileHelpingClean :
Cleaner.register and Allocator : Deallocator ?
Regards, Peter
And here's also a diff between webrev.12.part2 and webrev.13.part2:
http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.diff.12to13.part2/
Regards, Peter