Hi Philippe, Over the weekend I coded up a simple change to the Guava size-based eviction algorithm to fix this. With my proposed change there are no API changes and it works as a drop in replacement in ES. As you probably know ES renames and compiles in the Guava libraries so actually deploying a new build of Guava requires rebuilding ES.
The approach I took was to allow segments to grow larger than maxSegmentWeight such that the total cache size remains below the overall maxWeight. Then, if eviction within one segment doesn't reduce the cache weight to below maxWeight, I find the largest segment and evict from there. I use tryLock() so that if another thread is already using that segment, eviction will happen as new values are loaded there. Like I said, simple. Perhaps we can work together to review my change, make improvements on it, and get it submitted? Craig. On Monday, September 22, 2014 7:30:44 AM UTC-7, Philippe Laflamme wrote: > Hi, > > I've recently posted a question regarding mysterious field data cache > eviction[1]: ES is evicting field data cache entries even though it's > nowhere near the limit I've set. > > In an unrelated post, someone found the root cause of this problem[2]: the > maximum cache size specified is actually split evenly between Guava's cache > partitions. ES configures Guava to use 16 partitions[3], which means that > any given field data inserted will actually have a maximum size of > indices.fielddata.cache.size / 16 > > In my case, I configured the cache size to be 10GB, but saw eviction at > very low cache usage (1.5GB in some cases). This is because at least one of > the cache partitions hit its maximum size of 625MB. > > Obviously, the short-term solution is to increase the field data cache > size, but this will require that we overcommit by quite a bit in order to > have partitions with a sensible size for our most frequent queries. > > Until Guava provides a way to have a global maximum size instead of a > per-partition size (as mentioned by Craig in his post), it would be nice to > have a handle on the number of partitions created for this cache. If I set > this to 2, for example, I'm still allowing 2 threads to write to this cache > concurrently without having to overcommit my global field data cache size > (at least not by much). > > Anyone have another idea about how to deal with this? > > Cheers, > Philippe > [1] https://groups.google.com/d/msg/elasticsearch/54XzFO44yJI/sd7Fm-WcrPcJ > [2] https://groups.google.com/d/msg/elasticsearch/42qrpYRJvsU/jhl3UZZG5sQJ > [3] > https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/elasticsearch/indices/fielddata/cache/IndicesFieldDataCache.java#L77 > > -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/56e71573-7b1a-47da-ae07-d8775804f104%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
