Re: caches and soft references

Benjamin Manes Thu, 15 Nov 2018 19:51:01 -0800

Jeremy Manson describes the process fairly well in an old blog post 
<http://jeremymanson.blogspot.com/2010/02/garbage-collection-softreferences.html>
.


Because soft references are based on the amount of memory given to JVM, the 
cost increases as more memory is added. This can be surprising 
<https://bugs.java.com/bugdatabase/view_bug.do;jsessionid=cfd518f51afc7780e5188276b5f9?bug_id=6912889>,
 
e.g. if abused to fill up the heap then full collections will happen much 
more frequently. When resolving by increasing the instance size the problem 
gets worse.

I believe that this is less of a problem on region-based collectors, like 
G1, compared to generational ones. The regions are independant and can be 
evacuated much more aggressively, so the impact of impact on pause times 
may be lessened. Of course, that also negatively impact application cache 
hit rates if used in that manner.

Does it even make sense to use weakValues


Yes, weak values can be aggressively collected, so the impact is much 
smaller. However their use cases is often quite different.

what is the recommended way to cache objects?
>

Prefer an explicit size bound on strongly referenced objects, expose a 
setting, and report the statistics for monitoring.

As soft references are evicted using a global LRU when there is GC 
pressure, very poor victims may be chosen. For example a performance 
sensitive application cache might have its entries removed in favor of 
keeping noisy, low value entries from a cache buried deep within an 
external dependency. It can be difficult to predict and replicate 
performance problems, since the cache is reactive to environmental 
conditions. 

An application designed cache can also leverage algorithmic improvements. 
Modern eviction policies can significantly outperform 
<https://github.com/ben-manes/caffeine/wiki/Efficiency> LRU when frequency 
provides a better indicator. In those cases it may also be more GC hygienic 
as low value items are quickly evicted, whereas LRU can to cause 
unnecessary old gen promotion due to aging items much more slowly.

Historically the arguments in favor of soft caches was application 
simplicity (less tuning) and better concurrency (no explicit locking as GC 
maintains it). Neither bore out except in demo code and microbenchmarks, 
and tend to result in worse behavior overall. You should strive to remove 
soft references from your application and generally be very weary of their 
usage when encountered.

On Thursday, November 15, 2018 at 10:22:05 AM UTC-8, Siva Velusamy wrote:
>
> In the thread below, gil@ makes the following statement:
>
> > For most GC algorithms, or at least for the more costly parts of such 
> algorithms,
> > GC efficiency is roughly linear to EmptyHeap/LiveSet. ...
> > This should [hopefully] make it obvious why using SoftReferences is a 
> generally terrible idea.
>
> I'm not following that conclusion. Is that because SoftReferenced objects 
> are still considered to be part of the LiveSet in the calculation above, 
> and that leads to increased GC cost?
>
> As a follow up, what is the recommended way to cache objects? Currently 
> many places in our codebase use Guava's CacheBuilder with softValues 
> <https://google.github.io/guava/releases/snapshot/api/docs/com/google/common/cache/CacheBuilder.html#softValues-->.
>  
> Does it even make sense to use weakValues, or is the suggestion to just use 
> strong values, but try to restrict the number of entries?
>
> ---------- Forwarded message ---------
> From: Gil Tene <[email protected] <javascript:>>
> Date: Sat, Nov 10, 2018 at 8:51 AM
> Subject: Re: Sorting a very large number of objects
> To: mechanical-sympathy <[email protected] <javascript:>>
>
>
>
>
> On Friday, November 9, 2018 at 7:08:23 AM UTC-8, Shevek wrote:
>>
>> Hi, 
>>
>> I'm trying to sort/merge a very large number of objects in Java, and 
>> failing more spectacularly than normal. The way I'm doing it is this: 
>>
>> * Read a bunch of objects into an array. 
>> * Sort the array, then merge neighbouring objects as appropriate. 
>> * Re-fill the array, re-sort, re-merge until compaction is "not very 
>> successful". 
>> * Dump the array to file, repeat for next array. 
>> * Then stream all files through a final merge/combine phase. 
>>
>> This is failing largely because I have no idea how large to make the 
>> array. Estimating the ongoing size using something like JAMM is too 
>> slow, and my hand-rolled memory estimator is too unreliable. 
>>
>> The thing that seems to be working best is messing around with the array 
>> size in order to keep some concept of runtime.maxMemory() - 
>> runtime.totalMemory() + runtime.freeMemory() within a useful bound. 
>>
>> But there must be a better solution. I can't quite think a way around 
>> this with SoftReference because I need to dump the data to disk when the 
>> reference gets broken, and defeating me right now. 
>>
>> Other alternatives would include keeping all my in-memory data 
>> structures in serialized form, and paying the ser/deser cost to compare, 
>> but that's expensive - my main overhead right now is gc. Serialization 
>> is protobuf, although that's changeable, since it's annoying the hell 
>> out of me (please don't say thrift - but protobuf appears to have no way 
>> to read from a stream into a reusable object - it has to allocate the 
>> world every single time). 
>>
>
> In general, whenever I see "my overhead is gc" and "unknown memory size" 
> together, I see it as a sign of someone pushing heap utilization high and 
> getting into the inefficient GC state. Simplistically, you should be able 
> to drop the GC cost to an arbitrary % of overall computation cost by 
> increasing the amount (or relative portion) of empty heap in your set. So 
> GC should never be "a bottleneck" from a throughput point of view unless 
> you have constraints (such as a minimum required live set and a maximum 
> possible heap size) that force you towards a high utilization of the heap 
> (in terms of LiveSet/HeapSize). The answer to such a situation is generally 
> "get some more RAM for this problem" rather than put in tons of work to fit 
> this in". 
>
> For most GC algorithms, or at least for the more costly parts of such 
> algorithms, GC efficiency is roughly linear to EmptyHeap/LiveSet. Stated 
> otherwise, GC cost grows with LiveSet/EmptyHeap or 
> LiveSet/(HeapSize-LiveSet). As you grow the amount you try to cram into a 
> heap of a given size, you increase the GC cost to the square of your 
> cramming efforts. And for every doubling of the empty heap [for a given 
> live set] you will generally half the GC cost.
>
> This should [hopefully] make it obvious why using SoftReferences is a 
> generally terrible idea.
>  
>
>>
>> Issues: 
>> * This routine is not the sole tenant of the JVM. Other things use RAM. 
>>
>  
> You can try to establish what an "efficient enough" heap utilization level 
> is for your use case (a level that keeps overall GC work as a % of CPU 
> spend to e.g. below 10%), and keep your heap use to a related fraction of 
> whatever heap size you get to have on the system you land on.
>  
>
>> * This has to be deployed and work on systems whose memory config is 
>> unknown to me. 
>>
>> Can anybody please give me pointers? 
>>
>> S. 
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "mechanical-sympathy" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"mechanical-sympathy" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: caches and soft references

Reply via email to