[ 
https://issues.apache.org/jira/browse/LUCENE-8757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16830173#comment-16830173
 ] 

Michael McCandless commented on LUCENE-8757:
--------------------------------------------

Thanks [~atris] – I agree it's important to have better defaults for how we 
coalesce segments into per-query-per-thread work units.  A few small comments:
 * Can you insert {{_}} in the big number constants (e.g. {{25000000}})?  Makes 
it easier to read, and open-source code is written for reading :)
 * I think something is wrong with {{docSum}} – you only set it, and never add 
to it?  I think the intention is to sum up docs in multiple adjacent (sorted by 
{{maxDoc}}) segments until that count exceeds {{25000000}}?
 * How did you pick {{25000000}} and {{100}} as good constants?  We are using 
much smaller values in our production infrastructure – {{250_000}} and {{5}}, 
admittedly after only a little experimentation. 
 * Can you add some tests?  You can maybe make the slice method a package 
private static method and then create test cases with "interesting" 
{{LeafReaderContext}} combinations?  In particular, a test case exposing the 
{{docSum}} bug would be great, then fix that bug, then see the test case pass.

> Better Segment To Thread Mapping Algorithm
> ------------------------------------------
>
>                 Key: LUCENE-8757
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8757
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Atri Sharma
>            Priority: Major
>         Attachments: LUCENE-8757.patch
>
>
> The current segments to threads allocation algorithm always allocates one 
> thread per segment. This is detrimental to performance in case of skew in 
> segment sizes since small segments also get their dedicated thread. This can 
> lead to performance degradation due to context switching overheads.
>  
> A better algorithm which is cognizant of size skew would have better 
> performance for realistic scenarios



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to