[ 
https://issues.apache.org/jira/browse/LUCENE-4791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13584319#comment-13584319
 ] 

Yonik Seeley commented on LUCENE-4791:
--------------------------------------

I did some ad-hoc testing to verify:

randomly distributed dense terms that almost never match: 3.7% perf increase
randomly distributed dense terms that almost always match: 0%
random(10) on one field matching random(10) on another: 4.1% perf increase
terms grouped in blocks of 10 (i.e. 10 sequential docs have same value): 67% 
perf increase

As you can see, this really hits non-random distribution of terms the most.
The larger the blocks of terms, the larger the performance increase after 
applying the patch.  I was able to get it up to 10x, but it's really 
theoretically unbounded.

                
> ConjunctionTermScorer scans instead of skips on first scorer
> ------------------------------------------------------------
>
>                 Key: LUCENE-4791
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4791
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>         Attachments: LUCENE-4791.patch
>
>
> As discovered by John Wang, it looks like a bug was introduced when 
> ConjunctionTermScorer was first introduced in 7/2011 that causes scanning 
> instead of skipping on the lowest frequency term. 
> http://markmail.org/message/wuukqzbhe7zgkfmf

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to