[
https://issues.apache.org/jira/browse/LUCENE-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16757614#comment-16757614
]
Atri Sharma commented on LUCENE-8675:
-------------------------------------
Thanks for the comments.
Having a multi shard approach makes sense, but a search is still bottlenecked
by the largest segment it needs to scan. If there are many segments of that
type, that might become a problem.
While I agree that range queries might not be directly benefited from parallel
scans, but other queries (such as TermQueries) might be benefitted from a
segment parallel scan. In a typical ElasticSearch interactive query, we see
spikes when a large segment is hit for an interactive use case. Such cases can
be optimized with parallel scans.
We should have a method of deciding whether a scan should be parallelized or
not, and then let the execution operator get a set of nodes to execute. That is
probably outside the scope of this JIRA, but I wanted to open this thread to
get the conversation going.
> Divide Segment Search Amongst Multiple Threads
> ----------------------------------------------
>
> Key: LUCENE-8675
> URL: https://issues.apache.org/jira/browse/LUCENE-8675
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Reporter: Atri Sharma
> Priority: Major
>
> Segment search is a single threaded operation today, which can be a
> bottleneck for large analytical queries which index a lot of data and have
> complex queries which touch multiple segments (imagine a composite query with
> range query and filters on top). This ticket is for discussing the idea of
> splitting a single segment into multiple threads based on mutually exclusive
> document ID ranges.
> This will be a two phase effort, the first phase targeting queries returning
> all matching documents (collectors not terminating early). The second phase
> patch will introduce staged execution and will build on top of this patch.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]