[ 
https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738627#comment-13738627
 ] 

Tom Burton-West commented on LUCENE-5175:
-----------------------------------------

Thanks Robert,

In the article, they claim that the change doesn't have a performance impact.  
On the other hand, I'm not familiar enough with Java performance to be able to 
eyeball it,  and it looks to me like we added one or more floating point 
operations, so it would be good to benchmark, especially since the scoring alg 
gets run against every hit, and we might have millions of hits for a poorly 
chosen query. (And if we switch to page-level indexing we could have hundreds 
of millions of hits).

I was actually considering making it a subclass instead of just modifying 
BM25Similarity, so that it would be easy to benchmark, and if it turns out that 
there is a significant perf difference, that users could choose which 
implementation to use.   I saw that computeWeight in BM25Similarity was final 
and decided I didn't know enough about why this is final to either refactor to 
create a base class, or change the method  in order to subclass.

Is luceneutil the same as lucene benchmark?   I've been wanting to learn how to 
use lucene benchmark for some time.  

Tom

                
> Add parameter to lower-bound TF normalization for BM25 (for long documents)
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-5175
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5175
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Tom Burton-West
>            Priority: Minor
>         Attachments: LUCENE-5175.patch
>
>
> In the article "When Documents Are Very Long, BM25 Fails!" a fix for the 
> problem is documented.  There was a TODO note in BM25Similarity to add this 
> fix. I will attach a patch that implements the fix shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to