[ https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791424#comment-16791424 ]
Christoph Kaser commented on LUCENE-8542: ----------------------------------------- {quote}Right I get how it can help with small slices, but at the same time I'm seeing small slices as something that should be avoided in order to limit context switching so I don't think we should design for small slices? {quote} Small slices are the default: The default implementation of IndexSearcher.slices() returns one slice per segment. Since the search runs in an Executor, this may not cause a lot of context switching depending on the thread pool parameters. But you are right, the default implementation of slices() may not be optimal. > Provide the LeafSlice to CollectorManager.newCollector to save memory on > small index slices > ------------------------------------------------------------------------------------------- > > Key: LUCENE-8542 > URL: https://issues.apache.org/jira/browse/LUCENE-8542 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search > Reporter: Christoph Kaser > Priority: Minor > Attachments: LUCENE-8542.patch > > > I have an index consisting of 44 million documents spread across 60 segments. > When I run a query against this index with a huge number of results requested > (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch > was configured to use an ExecutorService. > (I know this kind of query is fairly unusual and it would be better to use > paging and searchAfter, but our architecture does not allow this at the > moment.) > The reason for the huge memory requirement is that the search [will create a > TopScoreDocCollector for each > segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404], > each one with numHits = 5 million. This is fine for the large segments, but > many of those segments are fairly small and only contain several thousand > documents. This wastes a huge amount of memory for queries with large values > of numHits on indices with many segments. > Therefore, I propose to change the CollectorManager - interface in the > following way: > * change the method newCollector to accept a parameter LeafSlice that can be > used to determine the total count of documents in the LeafSlice > * Maybe, in order to remain backwards compatible, it would be possible to > introduce this as a new method with a default implementation that calls the > old method - otherwise, it probably has to wait for Lucene 8? > * This can then be used to cap numHits for each TopScoreDocCollector to the > leafslice-size. > If this is something that would make sense for you, I can try to provide a > patch. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org