[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16834587#comment-16834587
 ] 

Atri Sharma commented on LUCENE-8757:
-------------------------------------

[~simonw] The reasoning behind adding the second parameter was to ensure that 
we do not bias against the case where there are a large number of small 
segments. For eg, if there are 100 segments and all of them are small, then we 
should still allow parallel searches to get some performance gains. Although 
this should be a rare case since merging will coalesce them.

 

However, I agree with you that this might be contradicting the whole idea of 
adding the 250K docs split point. If all segments together in an index do not 
add up to 250K, then the index is small enough to not need parallelism.

 

Attached is an updated patch

 

[^LUCENE-8757.patch]

> Better Segment To Thread Mapping Algorithm
> ------------------------------------------
>
>                 Key: LUCENE-8757
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8757
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Atri Sharma
>            Priority: Major
>         Attachments: LUCENE-8757.patch, LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to