Re: RFR: JDK-8149925 We don't need jdk.internal.ref.Cleaner any more

Peter Levart Tue, 05 Apr 2016 07:43:24 -0700

Hi Roger,

On 04/04/2016 11:50 PM, Roger Riggs wrote:

Hi Peter,


Stepping back just a bit.


Right, let's clear up.

The old Cleaner running on the Reference processing thread had a few(2) very well controlledfunctions, reference processing and deallocating DirectByteBuffers.Maybe we can't do too much
better than that.

...yes, at the beginning, until it was (re)used for other purposes too,in: java.lang.ProcessImpl,java.lang.invoke.MethodHandleNatives.CallSiteContext,jdk.internal.perf.Perf, sun.nio.ch.IOVecWrapper, sun.nio.fs.NativeBufferand sun.java2d.marlin.OffHeapArray.

But those other usages have been converted to use newjava.lang.ref.Cleaner so old cleaner is now back to basics -DirectByteBuffers. And with that, DirectByteBuffers allocating threadsonly help ReferenceHandler thread enqueue References and executeDirectByteBuffer deallocators, which is an improvement.

But should we keep that status quo? It's nothing wrong with it as it is,except I think we can do better.

The old worst case performance/latency wise was the referenceprocessing threaddid the work and the allocating thread did very little synchronizingand just did the retries.

The number of retries was exactly the same as the number of Referenceshelped to be enqueued or in case of Cleaner(s), executed:


        // retry while helping enqueue pending Reference objects
        // which includes executing pending Cleaner(s) which includes
        // Cleaner(s) that free direct buffer memory
        while (jlra.tryHandlePendingReference()) {
            if (tryReserveMemory(size, cap)) {
                return;
            }
        }

If the share of pending References that are also Cleaners was high,chances were higher that not much helping was needed as one cleanedDBB.Deallocator could unreserve enough memory for next reservationattempt to succeed. So allocating thread helped only until it succeededin reserving the native memory leaving the rest of work to anotherallocating request/thread or to ReferenceHandler.

In the best case, all the real work was done by the allocating thread,if the interactions with GCwork out perfectly. But it was still the case that the bufferalloc/dealloc throughput wasmet with the division of work separating the reference processingthread and the allocating thread.

Yes, whichever thread was quicker. If ReferenceHandler thread had beenwaking up from wait() for a long time, allocating thread could havealready processed all the References before ReferenceHandler finallystarted to look around. If there were lots of new pending referenceddiscovered, ReferenceHandler thread could finally join the party andfight for the same lock...

The function that can only be provided by CleanerImpl / Referenceprocessing thread state is
knowledge that the cleaning  queue is empty.

...and that the discovered pending references have actually beenenqueued before that...

The helping functions were/are a bit troublesome because of mixingexecutionenvironments of the thread allocating direct buffers and thecleanables and it seemed that
more than a little complexity was needed to compensate.


I totally agree.

If the bottleneck in processing is between the reference processingand cleanupthen it should be ok (based on previous comments) for the CleanerImplto help withreference processing (after it has an empty queue and before it blockswaiting or in every loop).
Though if you already tried this combination, I don't recall the results.

I don't thing there is a problem because of any bottleneck. And if therewas a bottleneck we would only have a problem withallocation/deallocation throughput and not with OOME(s). The problem isbecause reference discovery is not triggered as a result of nativememory reservation approaching or reaching the limit. There is no heapmemory pressure from DirectByteBuffer(s) because they are small objects.So a mechanism must be in place that triggers reference discovery andwaits for discovered references to be processed before failing thenative memory allocation. A mechanism that tries to simulate whathappens with GC when there is heap memory pressure. GC guarantees thatfull-GC is executed and heap allocation retried after that beforefinally giving up with OOME. We need a mechanism that attempts to do thesame for direct memory. Throughput is a nice property but we are notdirectly seeking its improvement. We just not want to make things muchworse.

Helping the Cleaner thread to process cleanup functions is the easiestway to wait for cleanup functions to be processed and for queue todrain. Simply because of ReferenceQueue API. If you poll() nextReference from the queue and get null, you know the queue is empty, butif you get something, you have to execute it and not just ignore it.Maybe we could patch into the ReferenceQueue implementation and extendits API with an internal method that would not return next Reference butjust information that ReferenceHandler thread has done so or that thequeue is empty. I'll think about it.

As you pointed out it would be more efficient if the allocating threadcould be awarewhen it was known there was nothing ready to cleanup so it can retryand invoke GC or
throw out of memory if appropriate.
Adding a method that returned the count of completed cleaning cycles(or similar)
to CleanerImpl could exist with a minimal of coupling and still provide
the information needed without commingling the execution threads.

I'll think about how to surface this functionality in the CleanerImplmost elegantly. The functionality of providing only the counter ofcleaning cycles as a getter might not be most appropriate. What we alsoneed is some mechanism to wait and be woken up to retry reservation onlyat appropriate points in time otherwise allocating threads could justspin eating CPU time. So my latest attempt was to encapsulate the entireretry logic inside ExtendedCleaner with ByteBuffer/Bits only providingallocation function to this logic, which in my view of API is prettydecoupled and general.

I don't see the need to change Cleaner to an interface to be able toprovidean additional method on CleanerImpl or a subclass and a factory methodcould
provide for a clean and very targeted interface to Bits/Direct buffer.

I would like this to be an instance method so it would naturally pertainto a particular Cleaner instance. Or it could be a static method thattakes a Cleaner instance. One of my previous webrevs did have suchmethod on the CleanerImpl, but I was advised to move it to Cleaner as apackage-private method and expose it via SharedSecrets to internal code.I feel such "camouflage" is very awkward now that we have modules andother mechanisms exist. So I thought it would be most elegant to makeCleaner an interface so it can be extended with an internal interface tocommunicate intent in a type-safe and auto-discoverable way. The changeto make it interface:


http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.part2.1.rev01/

...actually simplifies implementation (33 lines removed in total) andcould be seen as an improvement in itself.

Are you afraid that if Cleaner was an interface, others would attempt tomake implementations of it? Now that we have default methods oninterfaces it is easy to compatibly extend the API even if it is aninterface so that no 3rd party implementations are immediately broken.Are you thinking of security implications when some code is handed aCleaner instance that it doesn't trust? I don't think there is a utilityfor Cleaner instances to be passed from untrusted to trusted code, do you?

In the end it doesn't really matter. We can do it one way or the other.I just feel that using an interface is cleaner.


I'm sorry I haven't had time to try out concretely what I have in mind.
Please correct or remind me of missing salient considerations.


The bottom line is that we need a mechanism that:

- triggers reference discovery when native memory limit is approached orreached- retires native memory reservation at appropriate time slots untilsucceeding or until all pending references have been processed andCleanables executed at which time native memory reservation can failwith OOME.- if possible, doesn't execute cleanup functions by the allocatingthread but just waits for system threads to do the job.

- when triggered, does not make native memory allocation a bottleneck.

I think that what I did in my latest webrevs with ReferenceHandlerthread is an improvement in minimizing contended synchronization andinterference of allocating thread(s) with Reference enqueue-ing. Butinteraction of allocating thread(s) with Cleaner background thread couldbe improved and I have a couple of ideas to explore.


Thanks, Roger


Regards, Peter

On 4/2/2016 7:24 AM, Peter Levart wrote:
Hi Roger,

Thanks for looking at the patch.

On 04/02/2016 01:31 AM, Roger Riggs wrote:
Hi Peter,
I overlooked the introduction of another nested class (Task) tohandle the cleanup.But there are too many changes to see which ones solve a singleproblem.
Sorry to make more work, but I think we need to go back to theminimum necessarychange to make progress on this. Omit all of the little cleanupsuntil the end
or do them first and separately.

Thanks, Roger
No Problem. I understand. So let's proceed in stages. Since part1 isalready pushed, I'll call part2 stages with names: part2.1, part2.2,... and I'll start counting webrev revisions from 01 again, so webrevnames will be in the form: webrev.part2.1.rev01. Each part will be anincremental change to the previous one.
part2.1: This is preparation work to be able to have an extendedjava.lang.ref.Cleaner type for internal use. Sincejava.lang.ref.Cleaner is a final class, I propose to make it aninterface instead:
http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.part2.1.rev01/
This is a source-compatible change and it also simplifiesimplementation (no injection of Cleaner.impl access function intoCleanerImpl needed any more). What used to be java.lang.ref.Cleaneris renamed to jdk.internal.ref.CleanerImpl. What used to bejdk.internal.ref.CleanerImpl is now a nested static classjdk.internal.ref.CleanerImpl.Task (because it implements Runnable).Otherwise nothing has changed in the overall architecture of theCleaner except that public-facing API is now an interface instead ofa final class. This allows specifying internal extension interfaceand internal extension implementation.
CleanerTest passes with this change.

So what do you think?

Regards, Peter
On 4/1/16 5:51 PM, Roger Riggs wrote:
Hi Peter,

Thanks for the diffs to look at.

Two observations on the changes.
- The Cleaner instance was intentionally and necessarily differentthan the CleanerImpl to enablethe CleanerImpl and its thread to terminate if the Cleaner is notlonger referenced.
Folding them into a single object breaks that.
Perhaps it is not too bad for ExtendedCleaner to subclassCleanerImpl with the cleanup helper/supplier behaviorand expose itself to Bits. There will be fewer moving parts. Thereis no need for two factory methods for
ExtendedCleaner unless you are going to use  a separate ThreadFactory.
- The Deallocator (and now Allocator) nested classes are identical,and there is a separate copy for eachtype derived from the Direct-X-template. But it may not be worthfixing until the rest of it is settled to avoid
more moving parts.
I don't have an opinion on the code changes in Reference, that'sdifferent kettle of fish.
More next week.

Have a good weekend, Roger


On 4/1/2016 12:46 PM, Peter Levart wrote:
On 04/01/2016 06:08 PM, Peter Levart wrote:
On 04/01/2016 05:18 PM, Peter Levart wrote:
@Roger:

...
About entanglement between nio Bits andExtendedCleaner.retryWhileHelpingClean(). It is the same levelof entanglement as between the DirectByteBuffer constructor andCleaner.register(). In both occasions an action is provided tothe Cleaner. Cleaner.register() takes a cleanup action andExtendedCleaner.retryWhileHelpingClean() takes a retriable"allocating" or "reservation" action. "allocation" or"reservation" is the opposite of cleanup. Both methods areencapsulated in the same object because those two functions mustbe coordinated. So I think that collocating them together makessense. What do you think?
...to illustrate what I mean, here's a variant that totallyuntangles Bits from Cleaner and moves the whole Cleanerinteraction into the DirectByteBuffer itself:
http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.13.part2/
Notice the symmetry between Cleaner.retryWhileHelpingClean :Cleaner.register and Allocator : Deallocator ?
Regards, Peter
And here's also a diff between webrev.12.part2 and webrev.13.part2:
http://cr.openjdk.java.net/~plevart/jdk9-dev/removeInternalCleaner/webrev.diff.12to13.part2/
Regards, Peter

Re: RFR: JDK-8149925 We don't need jdk.internal.ref.Cleaner any more

Reply via email to