Hi Roger,

On 04/04/2016 11:50 PM, Roger Riggs wrote:
Hi Peter,

Stepping back just a bit.

Right, let's clear up.


The old Cleaner running on the Reference processing thread had a few (2) very well controlled functions, reference processing and deallocating DirectByteBuffers. Maybe we can't do too much
better than that.

...yes, at the beginning, until it was (re)used for other purposes too, in: java.lang.ProcessImpl, java.lang.invoke.MethodHandleNatives.CallSiteContext, jdk.internal.perf.Perf, sun.nio.ch.IOVecWrapper, sun.nio.fs.NativeBuffer and sun.java2d.marlin.OffHeapArray.

But those other usages have been converted to use new java.lang.ref.Cleaner so old cleaner is now back to basics - DirectByteBuffers. And with that, DirectByteBuffers allocating threads only help ReferenceHandler thread enqueue References and execute DirectByteBuffer deallocators, which is an improvement.

But should we keep that status quo? It's nothing wrong with it as it is, except I think we can do better.


The old worst case performance/latency wise was the reference processing thread did the work and the allocating thread did very little synchronizing and just did the retries.

The number of retries was exactly the same as the number of References helped to be enqueued or in case of Cleaner(s), executed:

        // retry while helping enqueue pending Reference objects
        // which includes executing pending Cleaner(s) which includes
        // Cleaner(s) that free direct buffer memory
        while (jlra.tryHandlePendingReference()) {
            if (tryReserveMemory(size, cap)) {
                return;
            }
        }

If the share of pending References that are also Cleaners was high, chances were higher that not much helping was needed as one cleaned DBB.Deallocator could unreserve enough memory for next reservation attempt to succeed. So allocating thread helped only until it succeeded in reserving the native memory leaving the rest of work to another allocating request/thread or to ReferenceHandler.

In the best case, all the real work was done by the allocating thread, if the interactions with GC work out perfectly. But it was still the case that the buffer alloc/dealloc throughput was met with the division of work separating the reference processing thread and the allocating thread.

Yes, whichever thread was quicker. If ReferenceHandler thread had been waking up from wait() for a long time, allocating thread could have already processed all the References before ReferenceHandler finally started to look around. If there were lots of new pending referenced discovered, ReferenceHandler thread could finally join the party and fight for the same lock...


The function that can only be provided by CleanerImpl / Reference processing thread state is
knowledge that the cleaning  queue is empty.

...and that the discovered pending references have actually been enqueued before that...

The helping functions were/are a bit troublesome because of mixing execution environments of the thread allocating direct buffers and the cleanables and it seemed that
more than a little complexity was needed to compensate.

I totally agree.


If the bottleneck in processing is between the reference processing and cleanup then it should be ok (based on previous comments) for the CleanerImpl to help with reference processing (after it has an empty queue and before it blocks waiting or in every loop).
Though if you already tried this combination, I don't recall the results.

I don't thing there is a problem because of any bottleneck. And if there was a bottleneck we would only have a problem with allocation/deallocation throughput and not with OOME(s). The problem is because reference discovery is not triggered as a result of native memory reservation approaching or reaching the limit. There is no heap memory pressure from DirectByteBuffer(s) because they are small objects. So a mechanism must be in place that triggers reference discovery and waits for discovered references to be processed before failing the native memory allocation. A mechanism that tries to simulate what happens with GC when there is heap memory pressure. GC guarantees that full-GC is executed and heap allocation retried after that before finally giving up with OOME. We need a mechanism that attempts to do the same for direct memory. Throughput is a nice property but we are not directly seeking its improvement. We just not want to make things much worse.

Helping the Cleaner thread to process cleanup functions is the easiest way to wait for cleanup functions to be processed and for queue to drain. Simply because of ReferenceQueue API. If you poll() next Reference from the queue and get null, you know the queue is empty, but if you get something, you have to execute it and not just ignore it. Maybe we could patch into the ReferenceQueue implementation and extend its API with an internal method that would not return next Reference but just information that ReferenceHandler thread has done so or that the queue is empty. I'll think about it.


As you pointed out it would be more efficient if the allocating thread could be aware when it was known there was nothing ready to cleanup so it can retry and invoke GC or
throw out of memory if appropriate.
Adding a method that returned the count of completed cleaning cycles (or similar)
to CleanerImpl could exist with a minimal of coupling and still provide
the information needed without commingling the execution threads.

I'll think about how to surface this functionality in the CleanerImpl most elegantly. The functionality of providing only the counter of cleaning cycles as a getter might not be most appropriate. What we also need is some mechanism to wait and be woken up to retry reservation only at appropriate points in time otherwise allocating threads could just spin eating CPU time. So my latest attempt was to encapsulate the entire retry logic inside ExtendedCleaner with ByteBuffer/Bits only providing allocation function to this logic, which in my view of API is pretty decoupled and general.


I don't see the need to change Cleaner to an interface to be able to provide an additional method on CleanerImpl or a subclass and a factory method could
provide for a clean and very targeted interface to Bits/Direct buffer.

I would like this to be an instance method so it would naturally pertain to a particular Cleaner instance. Or it could be a static method that takes a Cleaner instance. One of my previous webrevs did have such method on the CleanerImpl, but I was advised to move it to Cleaner as a package-private method and expose it via SharedSecrets to internal code. I feel such "camouflage" is very awkward now that we have modules and other mechanisms exist. So I thought it would be most elegant to make Cleaner an interface so it can be extended with an internal interface to communicate intent in a type-safe and auto-discoverable way. The change to make it interface:

http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.part2.1.rev01/

...actually simplifies implementation (33 lines removed in total) and could be seen as an improvement in itself.

Are you afraid that if Cleaner was an interface, others would attempt to make implementations of it? Now that we have default methods on interfaces it is easy to compatibly extend the API even if it is an interface so that no 3rd party implementations are immediately broken. Are you thinking of security implications when some code is handed a Cleaner instance that it doesn't trust? I don't think there is a utility for Cleaner instances to be passed from untrusted to trusted code, do you?

In the end it doesn't really matter. We can do it one way or the other. I just feel that using an interface is cleaner.


I'm sorry I haven't had time to try out concretely what I have in mind.
Please correct or remind me of missing salient considerations.

The bottom line is that we need a mechanism that:

- triggers reference discovery when native memory limit is approached or reached - retires native memory reservation at appropriate time slots until succeeding or until all pending references have been processed and Cleanables executed at which time native memory reservation can fail with OOME. - if possible, doesn't execute cleanup functions by the allocating thread but just waits for system threads to do the job.
- when triggered, does not make native memory allocation a bottleneck.

I think that what I did in my latest webrevs with ReferenceHandler thread is an improvement in minimizing contended synchronization and interference of allocating thread(s) with Reference enqueue-ing. But interaction of allocating thread(s) with Cleaner background thread could be improved and I have a couple of ideas to explore.


Thanks, Roger


Regards, Peter


On 4/2/2016 7:24 AM, Peter Levart wrote:
Hi Roger,

Thanks for looking at the patch.

On 04/02/2016 01:31 AM, Roger Riggs wrote:
Hi Peter,

I overlooked the introduction of another nested class (Task) to handle the cleanup. But there are too many changes to see which ones solve a single problem.

Sorry to make more work, but I think we need to go back to the minimum necessary change to make progress on this. Omit all of the little cleanups until the end
or do them first and separately.

Thanks, Roger

No Problem. I understand. So let's proceed in stages. Since part1 is already pushed, I'll call part2 stages with names: part2.1, part2.2, ... and I'll start counting webrev revisions from 01 again, so webrev names will be in the form: webrev.part2.1.rev01. Each part will be an incremental change to the previous one.

part2.1: This is preparation work to be able to have an extended java.lang.ref.Cleaner type for internal use. Since java.lang.ref.Cleaner is a final class, I propose to make it an interface instead:

http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.part2.1.rev01/

This is a source-compatible change and it also simplifies implementation (no injection of Cleaner.impl access function into CleanerImpl needed any more). What used to be java.lang.ref.Cleaner is renamed to jdk.internal.ref.CleanerImpl. What used to be jdk.internal.ref.CleanerImpl is now a nested static class jdk.internal.ref.CleanerImpl.Task (because it implements Runnable). Otherwise nothing has changed in the overall architecture of the Cleaner except that public-facing API is now an interface instead of a final class. This allows specifying internal extension interface and internal extension implementation.

CleanerTest passes with this change.

So what do you think?

Regards, Peter





On 4/1/16 5:51 PM, Roger Riggs wrote:
Hi Peter,

Thanks for the diffs to look at.

Two observations on the changes.

- The Cleaner instance was intentionally and necessarily different than the CleanerImpl to enable the CleanerImpl and its thread to terminate if the Cleaner is not longer referenced.
Folding them into a single object breaks that.

Perhaps it is not too bad for ExtendedCleaner to subclass CleanerImpl with the cleanup helper/supplier behavior and expose itself to Bits. There will be fewer moving parts. There is no need for two factory methods for
ExtendedCleaner unless you are going to use  a separate ThreadFactory.

- The Deallocator (and now Allocator) nested classes are identical, and there is a separate copy for each type derived from the Direct-X-template. But it may not be worth fixing until the rest of it is settled to avoid
more moving parts.

I don't have an opinion on the code changes in Reference, that's different kettle of fish.

More next week.

Have a good weekend, Roger


On 4/1/2016 12:46 PM, Peter Levart wrote:


On 04/01/2016 06:08 PM, Peter Levart wrote:


On 04/01/2016 05:18 PM, Peter Levart wrote:
@Roger:

...

About entanglement between nio Bits and ExtendedCleaner.retryWhileHelpingClean(). It is the same level of entanglement as between the DirectByteBuffer constructor and Cleaner.register(). In both occasions an action is provided to the Cleaner. Cleaner.register() takes a cleanup action and ExtendedCleaner.retryWhileHelpingClean() takes a retriable "allocating" or "reservation" action. "allocation" or "reservation" is the opposite of cleanup. Both methods are encapsulated in the same object because those two functions must be coordinated. So I think that collocating them together makes sense. What do you think?

...to illustrate what I mean, here's a variant that totally untangles Bits from Cleaner and moves the whole Cleaner interaction into the DirectByteBuffer itself:

http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.13.part2/

Notice the symmetry between Cleaner.retryWhileHelpingClean : Cleaner.register and Allocator : Deallocator ?


Regards, Peter


And here's also a diff between webrev.12.part2 and webrev.13.part2:

http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.diff.12to13.part2/

Regards, Peter






Reply via email to