jgq2008303393 commented on a change in pull request #940: LUCENE-9002: Query caching leads to absurdly slow queries URL: https://github.com/apache/lucene-solr/pull/940#discussion_r334230846
########## File path: lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java ########## @@ -732,8 +741,39 @@ public ScorerSupplier scorerSupplier(LeafReaderContext context) throws IOExcepti if (docIdSet == null) { if (policy.shouldCache(in.getQuery())) { - docIdSet = cache(context); - putIfAbsent(in.getQuery(), docIdSet, cacheHelper); + final ScorerSupplier supplier = in.scorerSupplier(context); + if (supplier == null) { + putIfAbsent(in.getQuery(), DocIdSet.EMPTY, cacheHelper); + return null; + } + + final long cost = supplier.cost(); + return new ScorerSupplier() { + @Override + public Scorer get(long leadCost) throws IOException { + // skip cache operation which would slow query down too much + if ((cost > skipCacheCost || cost > leadCost * skipCacheFactor) + && in.getQuery() instanceof IndexOrDocValuesQuery) { Review comment: @jpountz Thanks for reminding. I found the following ES query has similar problem. ```java GET host_monitor/_search { "size": 100, "query": { "bool": { "filter": [ { // term query cost is 1.8w "term": { "timestamp": 1570291200 } }, { // terms query cost is 759.0w "terms": { "cpu_name": ["cpu0", "cpu1", "cpu2", "cpu3", "cpu4", "cpu5", "cpu6", "cpu7"] } } ] } } } ``` The test result of this query is as follows: Query | 1th | 2th | 3th | 4th | 5th | 6th -- | -- | -- | -- | -- | -- | -- Before Optimization | 28ms | 27ms | 29ms | 616ms | 16ms | 16ms After Optimization| 27ms | 30ms | 25ms | 27ms | 28ms | 27ms | queryPattern | latencyWithoutCaching | latencyWithCaching | leadCost | rangeQueryCost | skipCacheFactor | | ---------- | :-----------: | :-----------: | :-----------: | :-----------: | :-----------: | | cpu_name with 2 vlaues | 22ms | 165ms(+650%) | 18432 | 1911910 | 103 | | cpu_name with 4 vlaues | 24ms | 307ms(+1180%) | 18432 | 3778721| 205 | | cpu_name with 8 vlaues | 27ms | 616ms(+2270%) | 18432 | 7590457 | 411 | According to above test result, this PR also do a very good job at optimizing terms query execution. I will apply this PR to all query types in next patch. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org