Hi Peter, Interesting email. I think it is a thoughtful contribution and these are great responses to concerns and questions. I hope it receives the due consideration it deserves.
Kind regards, Kirk On May 31, 2015, at 9:32 PM, Peter Levart <peter.lev...@gmail.com> wrote: > Hi, > > Thanks for views and opinions. I'll try to confront them in-line... > > On 05/29/2015 04:18 AM, David Holmes wrote: >> Hi Peter, >> >> I guess I'm very concerned about the premise that finalization should scale >> to millions of objects and be performed highly concurrently. To me that's >> sending the wrong message about finalization. It also isn't the most >> effective use of cpu resources - most people would want to do useful work on >> most cpu's most of the time. >> >> Cheers, >> David > > @David > > Ok, fair enough. It shouldn't be necessary to scale finalization to millions > of objects and be performed concurrently. Normal programs don't need this. > But there is a diagnostic command being developed at this moment that > displays the finalization queue. The utility of such command, as I > understand, is precisely to display when the finalization thread can not cope > and Finalizer(s) accumulate. So there must be that some hypothetical programs > (ab)use finalization or are buggy (deadlock) so that the queue builds up. To > diagnose this, a diagnostic command is helpful. To fix it, one has to fix the > code. But what if the problem is not that much about the allocation/death > rate of finalizable instances then it is about the heavy code of finalize() > methods of those instances. I agree that such programs have a smell and > should be rewritten to not use finalization but other means of cleanup such > as multiple threads removing WeakReferences from the queue for example or > something completely different and not based on Reference(s). But wouldn't it > be nice if one could simply set a system property for the max. number of > threads processing Finalizer(s)? > > I have prepared an improved variant of the prototype that employs a single > ReferenceHandler thread and adds a ForkJoinPool that by default has a single > worker thread which replaces the single finalization thread. So by default, > no more threads are used than currently. If one wants (s)he can increase the > concurrency of finalization with a system property. > > I have also improved the benchmarks that now focus on CPU overhead when > processing references at more typical rates, rather than maximum throughput. > They show that all changes taken together practically half the CPU time > overhead of the finalization processing. So freed CPU time can be used for > more useful work. I have also benchmarked the typical asynchronous > WeakReference processing scenario where one thread removes enqueued > WeakReferences from the queue. Results show about 25% decrease of CPU time > overhead. > > Why does the prototype reduce more overhead for finalization than > WeakReference processing? The main improvement in the change is the use of > multiple doubly-linked lists for registration of Finalizer(s) and the use of > lock-less algorithm for the lists. The WeakReference processing benchmark > also uses such lists internally to handle registration/deregistration of > WeakReferences so that the impact of this part is minimal and the difference > of processing overheads between original and changed JDK code more obvious. > (De)registration of Finalizer(s) OTOH is part of JDK infrastructure, so the > improvement to registration list(s) also shows in the results. The results of > WeakReferece processing benchmark also indicate that reverting to the use of > a single finalization thread that just removes Finalizer(s) from the > ReferenceQueue could lower the overhead even a bit further, but then it would > not be possible to leverage FJ pool to simply configure the parallelism of > finalization. If parallel processing of Finalizer(s) is an undesirable > feature, I could restore the single finalization thread and the CPU > overhead of finalization would be reduced to about 40% of current overhead > with just the changes to data structures. > > So, for the curious, here's the improved prototype: > > > http://cr.openjdk.java.net/~plevart/misc/JEP132/ReferenceHandling/webrev.02/ > > And here are the improved benchmarks (with some results inline): > > http://cr.openjdk.java.net/~plevart/misc/JEP132/ReferenceHandling/refproc/ > > > The benchmark results in the ThroughputBench.java show the output of the > test(s) when run with the Linux "time" command which shows the elapsed real > time and the consumed user and system CPU times. I think this is relevant for > measuring CPU overhead. > > So my question is: Is it or is it not desirable to have a configurable means > to parallelize the finalization processing? The reduction of CPU overhead of > infrastructure code should always be desirable, right? > > On 05/29/2015 05:57 AM, Kirk Pepperdine wrote: >> Hi Peter, >> >> It is a very interesting proposal but to further David’s comments, the >> life-cycle costs of reference objects is horrendous of which the actual >> process of finalizing an object is only a fraction of that total cost. >> Unfortunately your micro-benchmark only focuses on one aspect of that cost. >> In other words, it isn’t very representative of a real concern. In the real >> world the finalizer *must compete with mutator threads and since F-J is an >> “all threads on deck” implementation, it doesn’t play well with others. It >> creates a “tragedy of the commons”. That is situations where everyone >> behaves rationally with a common resource but to the detriment of the whole >> group”. In short, parallelizing (F-Jing) *everything* in an application is >> simply not a good idea. We do not live in an infinite compute environment >> which means to have to consider the impact of our actions to the entire >> group. > > @Kirk > > I changed the prototype to only use a single FJ thread by default > (configurable with a system property). Lowering the CPU overhead of finalizer > processing for 50% is also an improvement. I'm still keeping finalization > FJ-pool for now because it is more scaleable and has less overhead than a > solution with multiple threads removing references from the same > ReferenceQueue. This happens when the FJ-pool is configured with > 1 > parallelism or when user code calls Runtime.runFinalization() that translates > to ForkJoinPool.awaitQuiescence() which lends the calling thread to help the > poll execute the tasks. > >> This was one of the points of my recent article in Java Magazine which I >> wrote to try to counter some of the rhetoric I was hearing in conference >> about the universal benefits of being able easily parallelize streams in >> Java 8. Yes, I agree it’s a great feature but it must be used with >> discretion. Case in point. After I finished writing the article, I started >> running into a couple of early adopters that had swallowed the parallel >> message whole indiscriminately parallelizing all of their streams. As you >> can imagine, they were quite surprised by the results and quickly worked to >> de-parallelize *all* of the streams in the application. >> >> To add some ability to parallelize the handling of reference objects seems >> like a good idea if you are collecting large numbers of reference objects >> (>10,000 per GC cycle). However if you are collecting large numbers of >> reference objects you’re most likely doing something else wrong. IME, >> finalization is extremely useful but really only for a limited number of use >> cases and none of them (to date) have resulted in the app burning through >> 1000s of final objects / sec. >> >> It would be interesting to know why why you picked on this particular issue. > > Well, JEP-132 was filed by Oracle, so I thought I'll try to tackle some of > it's goals. I think I at least showed that the VM part of reference handling > is mostly not the performance problem (if there is a problem at all), but the > Java side could be modernized a bit. > >> Kind regards, >> Kirk > > On 05/29/2015 07:20 PM, Rezaei, Mohammad A. wrote: >> For what it's worth, I fully agree with David and Kirk around finalization >> not necessarily needing this treatment. >> >> However, I was hoping this would have the effect of improving >> (non-finalizable) reference handling. We've seen serious issues in >> WeakReference handling and have had to write some twisted code to deal with >> this. > > @Moh > > Can you elaborate some more on what twists were necessary or what problems > you had? > >> So I guess the question I have to Kirk and David is: do you feel a GC load >> of 10K WeakReferences per cycle is also "doing something else wrong"? > > If there is an elegant way to achieve your goal without using WeakReferences > then it might be better to not use them. But it is also true that > WeakReferences frequently lend an elegant way to solve a problem. The same > goes with finalization which is sometimes even more elegant. > >> Sorry if this is going off-topic. > > You're spot on topic and thanks for your comment. > >> Thanks >> Moh >> >> > > > Regards, Peter >