Christoph Kaser created LUCENE-8542:
---------------------------------------

             Summary: Provide the LeafSlice to CollectorManager.newCollector to 
save memory on small index slices
                 Key: LUCENE-8542
                 URL: https://issues.apache.org/jira/browse/LUCENE-8542
             Project: Lucene - Core
          Issue Type: Improvement
          Components: core/search
            Reporter: Christoph Kaser


I have an index consisting of 44 million documents spread across 60 segments. 
When I run a query against this index with a huge number of results requested 
(e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch was 
configured to use an ExecutorService.

(I know this kind of query is fairly unusual and it would be better to use 
paging and searchAfter, but our architecture does not allow this at the moment.)

The reason for the huge memory requirement is that the search [will create a 
TopScoreDocCollector for each 
segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404],
 each one with numHits = 5 million. This is fine for the large segments, but 
many of those segments are fairly small and only contain several thousand 
documents. This wastes a huge amount of memory for queries with large values of 
numHits on indices with many segments.

Therefore, I propose to change the CollectorManager - interface in the 
following way:
 * change the method newCollector to accept a parameter LeafSlice that can be 
used to determine the total count of documents in the LeafSlice
 * Maybe, in order to remain backwards compatible, it would be possible to 
introduce this as a new method with a default implementation that calls the old 
method - otherwise, it probably has to wait for Lucene 8?
 * This can then be used to cap numHits for each TopScoreDocCollector to the 
leafslice-size.

If this is something that would make sense for you, I can try to provide a 
patch.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to