[ 
https://issues.apache.org/jira/browse/LUCENE-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13607711#comment-13607711
 ] 

Adrien Grand commented on LUCENE-4858:
--------------------------------------

{quote} What in the patch guarantees that any segment with more than 
maxBufferedDocs is sorted? Perhaps I've missed it, but I looked for code which 
ensures every such segment gets picked up by SortingMP, however didn't find it.

I don't think that in general we should make assumptions based on a 
maxBufferedDocs setting because the default setting in IWC is per RAM 
consumption and also it seems slightly unrelated. I.e. if a segment is sorted, 
but has deletions such that numDocs < maxBufferedDocs, we do full collection, 
while we can early terminate as usual?{quote}

Indeed I think that finding out which segments are sorted is the main issue. My 
idea was to say that if you want to use early query termination, you need to 
set maxBufferedDocs to a given limit (low values improve early query 
termination while high values improve indexing speed), so that large segments 
(the ones that are interesting for early query termination since they require 
time to collect) that have more than maxBufferedDocs documents (deleted or not) 
are known to be sorted, because they result from a merge. Of course, this could 
miss some small segments which are sorted but since they are small, they're not 
as interesting for early query termination?

What options do we have here? I think you mentionned tagging sorted segments, 
do you have an idea where/how we could do that?

bq. And hopefully we can stuff the early termination logic down to 
IndexSearcher eventually. There are other scenarios for early termination, such 
as time limit, and therefore I think it's ok if we have an 
EarlyTerminationException which IndexSearcher responds to.

Inded, I think this makes sense.
                
> Ability to terminate queries on a per-segment basis
> ---------------------------------------------------
>
>                 Key: LUCENE-4858
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4858
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 4.3
>
>
> Spin-off of LUCENE-4752, see 
> https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13606565&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13606565
>  and 
> https://issues.apache.org/jira/browse/LUCENE-4752?focusedCommentId=13607282&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13607282
> When an index is sorted per-segment, queries that sort according to the index 
> sort order could be early terminated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to