[
https://issues.apache.org/jira/browse/LUCENE-8675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16759440#comment-16759440
]
Michael McCandless commented on LUCENE-8675:
--------------------------------------------
{quote}If some segments are getting large enough that intra-segment parallelism
becomes appealing, then maybe an easier and more efficient way to increase
parallelism is to instead reduce the maximum segment size so that inter-segment
parallelism has more potential for parallelizing query execution.
{quote}
Yeah that is a good workaround given how Lucene works today.
It's essentially the same as your original suggestion ("make more shards and
search them concurrently"), just at the segment instead of shard level.
But this still adds some costs -- the per-segment fixed cost for each query.
That cost should be less than the per shard fixed cost in the sharded case, but
it's still adding some cost.
If instead Lucene had a way to divide large segments into multiple work units
(and I agree there are challenges with that! -- not just BKD and multi-term
queries, but e.g. how would early termination work?) then we could pay that
per-segment fixed cost once for such segments then let multiple threads share
the variable cost work of finding and ranking hits.
In our recently launched production index we see sizable jumps in the P99+
query latencies when a large segment merges finish and replicate, because we
are using "thread per segment" concurrency that we are hoping we could improve
by pushing thread concurrency into individual large segments.
> Divide Segment Search Amongst Multiple Threads
> ----------------------------------------------
>
> Key: LUCENE-8675
> URL: https://issues.apache.org/jira/browse/LUCENE-8675
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Reporter: Atri Sharma
> Priority: Major
>
> Segment search is a single threaded operation today, which can be a
> bottleneck for large analytical queries which index a lot of data and have
> complex queries which touch multiple segments (imagine a composite query with
> range query and filters on top). This ticket is for discussing the idea of
> splitting a single segment into multiple threads based on mutually exclusive
> document ID ranges.
> This will be a two phase effort, the first phase targeting queries returning
> all matching documents (collectors not terminating early). The second phase
> patch will introduce staged execution and will build on top of this patch.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]