[
https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738627#comment-13738627
]
Tom Burton-West commented on LUCENE-5175:
-----------------------------------------
Thanks Robert,
In the article, they claim that the change doesn't have a performance impact.
On the other hand, I'm not familiar enough with Java performance to be able to
eyeball it, and it looks to me like we added one or more floating point
operations, so it would be good to benchmark, especially since the scoring alg
gets run against every hit, and we might have millions of hits for a poorly
chosen query. (And if we switch to page-level indexing we could have hundreds
of millions of hits).
I was actually considering making it a subclass instead of just modifying
BM25Similarity, so that it would be easy to benchmark, and if it turns out that
there is a significant perf difference, that users could choose which
implementation to use. I saw that computeWeight in BM25Similarity was final
and decided I didn't know enough about why this is final to either refactor to
create a base class, or change the method in order to subclass.
Is luceneutil the same as lucene benchmark? I've been wanting to learn how to
use lucene benchmark for some time.
Tom
> Add parameter to lower-bound TF normalization for BM25 (for long documents)
> ---------------------------------------------------------------------------
>
> Key: LUCENE-5175
> URL: https://issues.apache.org/jira/browse/LUCENE-5175
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/search
> Reporter: Tom Burton-West
> Priority: Minor
> Attachments: LUCENE-5175.patch
>
>
> In the article "When Documents Are Very Long, BM25 Fails!" a fix for the
> problem is documented. There was a TODO note in BM25Similarity to add this
> fix. I will attach a patch that implements the fix shortly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]