[jira] [Commented] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)

Robert Muir (JIRA) Tue, 13 Aug 2013 15:03:39 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-5175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738894#comment-13738894
 ]


Robert Muir commented on LUCENE-5175:
-------------------------------------

Yes, unfortunately the crazy cache currently is what makes it as fast as 
DefaultSimilarity, otherwise its 25% slower :(

>From time to time i definitely upgrade my JVM and run luceneutil to see if 
>these caches can be removed!

As far as the norms, all the "provided" implementations were just setup to be 
compatible with DefaultSimilarity.
so you can test these things out without reindexing, and still have general 
support for index-time boosts and things like that.

If you don't care aout that, you can tweak the similarity to better meet your 
specific needs (and even choose the other direction, too: to compress them to 
use < 1 byte/doc: LUCENE-5077)


                
> Add parameter to lower-bound TF normalization for BM25 (for long documents)
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-5175
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5175
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>            Reporter: Tom Burton-West
>            Priority: Minor
>         Attachments: LUCENE-5175.patch
>
>
> In the article "When Documents Are Very Long, BM25 Fails!" a fix for the 
> problem is documented.  There was a TODO note in BM25Similarity to add this 
> fix. I will attach a patch that implements the fix shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5175) Add parameter to lower-bound TF normalization for BM25 (for long documents)

Reply via email to