[
https://issues.apache.org/jira/browse/LUCENE-8542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16791424#comment-16791424
]
Christoph Kaser commented on LUCENE-8542:
-----------------------------------------
{quote}Right I get how it can help with small slices, but at the same time I'm
seeing small slices as something that should be avoided in order to limit
context switching so I don't think we should design for small slices?
{quote}
Small slices are the default: The default implementation of
IndexSearcher.slices() returns one slice per segment. Since the search runs in
an Executor, this may not cause a lot of context switching depending on the
thread pool parameters. But you are right, the default implementation of
slices() may not be optimal.
> Provide the LeafSlice to CollectorManager.newCollector to save memory on
> small index slices
> -------------------------------------------------------------------------------------------
>
> Key: LUCENE-8542
> URL: https://issues.apache.org/jira/browse/LUCENE-8542
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Reporter: Christoph Kaser
> Priority: Minor
> Attachments: LUCENE-8542.patch
>
>
> I have an index consisting of 44 million documents spread across 60 segments.
> When I run a query against this index with a huge number of results requested
> (e.g. 5 million), this query uses more than 5 GB of heap if the IndexSearch
> was configured to use an ExecutorService.
> (I know this kind of query is fairly unusual and it would be better to use
> paging and searchAfter, but our architecture does not allow this at the
> moment.)
> The reason for the huge memory requirement is that the search [will create a
> TopScoreDocCollector for each
> segment|https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L404],
> each one with numHits = 5 million. This is fine for the large segments, but
> many of those segments are fairly small and only contain several thousand
> documents. This wastes a huge amount of memory for queries with large values
> of numHits on indices with many segments.
> Therefore, I propose to change the CollectorManager - interface in the
> following way:
> * change the method newCollector to accept a parameter LeafSlice that can be
> used to determine the total count of documents in the LeafSlice
> * Maybe, in order to remain backwards compatible, it would be possible to
> introduce this as a new method with a default implementation that calls the
> old method - otherwise, it probably has to wait for Lucene 8?
> * This can then be used to cap numHits for each TopScoreDocCollector to the
> leafslice-size.
> If this is something that would make sense for you, I can try to provide a
> patch.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]