[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16835491#comment-16835491
 ] 

Atri Sharma commented on LUCENE-8757:
-------------------------------------

[~simonw] The reason the sort was added was to have a consistency guarantee 
from the slicing algorithm i.e. two queries with the exact same distribution of 
segments should get the same number of slices, irrespective of the order in 
which the segments are traversed by the method. Consider a distribution of 8 
segments where 6 segments have 10,000 documents each, and two segments have 
130,000 documents each. For the below order of traversal of segments (each 
value represents the maxDoc of the segment):

{10_000, 130_000, 10_000, 10_000, 10_000, 10_000, 10_000, 130_000).

The slicing algorithm will create one slice consisting of all segments (since 
the last segment's addition is what causes the maxDocs limit to be breached).

 
If the segments were sorted, the order would be:

{130_000, 130_000, 10_000, 10_000, 10_000, 10_000, 10_000, 10_000}

 

This would lead to two slices being created.

Thoughts?



bq. also want to suggest to beef up testing a bit

Thanks, added the test. Will raise another iteration post conclusion on above 
discussion.

 

> Better Segment To Thread Mapping Algorithm
> ------------------------------------------
>
>                 Key: LUCENE-8757
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8757
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Atri Sharma
>            Priority: Major
>         Attachments: LUCENE-8757.patch, LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to