sgup432 opened a new issue, #15097:
URL: https://github.com/apache/lucene/issues/15097

   ### Description
   
   We recently discovered a bug in OpenSearch where LRUQueryCache was taking 
~80% of the heap, though it is capped at 10%(of heap) from OpenSearch side. 
While analyzing heap dump, we found out that there were bunch of large Boolean 
queries still lingering around in the cache bloating its size beyond its max 
size. 
   
   While looking at the LRUQueryCache 
[code](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java#L428-L433),
 it seems like we calculate the query size via 
`accountableQuery.ramBytesUsed()` only if the desired query implements 
`Accountable` interface and provide its own implementation. In this case, 
BooleanQuery doesn't provide such implementation and we default to 1kb, leading 
to underestimation of query size causing the cache size to bloat up with such 
large queries.
   
   Below is a snapshot from MAT where it shows LRUCache holding ~23gb of heap 
in an instance where we had 32gb of heap, and LRUCache is configured to be 
~3.2gb in size in this case. And majority of being held by ~6k BooleanQuery 
each around ~3.2mb in size.
   
   <img width="1042" height="137" alt="Image" 
src="https://github.com/user-attachments/assets/75d6403a-dfd6-4562-8510-ff454d899895";
 />
   
   
   
   
   
   ### Solution
   Below are the solutions I can think of:
   
   1. I think we can make BooleanQuery to implement Accountable interface and 
calculate rough estimate of ramBytesUsed from its desired clauses? Still better 
than using 1kb as default.
   2. Or Disable caching BooleanQueries with very high number of clauses. 
Though this still won't be accurate enough.
   3. Or have a different way to calculate cache's key, value size. I don't way 
an exact way to do this at this point.
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to