[ 
https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005688#comment-13005688
 ] 

Robert Muir commented on LUCENE-2091:
-------------------------------------

{quote}
your attachment (BM25SimilarityProvider) seems to rely on some other code 
(Stats.DocFieldStats) & AggregatesProvider .. which I guess is part of your DFR 
patch.. can you provide a pointer to that.
{quote}

Yeah this is from LUCENE-2392. Unfortunately it won't work with the most recent 
patch there, but both patches are just really exploration to see how we can 
divide into subtasks.

For an update, the JIRA issues aren't well linked but we have actually made 
pretty good progress on some major portions (imo these are the most 
interesting):
* Collection term stats: LUCENE-2862
* per-field similarity: LUCENE-2236
* termstate, to avoid redundant i/o for stats: LUCENE-2694
* norms cleanup: LUCENE-2771, LUCENE-2846

The next big step is to separate scoring from matching (see the latest patch on 
LUCENE-2392) so that similarity has full responsibility for all calculations, 
and so we get full integration with all queries, etc.

This isn't that complicated: however, in order to do this, we need to first 
refactor Explanations, so that a Similarity has the capability (and 
responsibility!) to fully explain its calculations. So I think this is the next 
issue to resolve before going any further.


> Add BM25 Scoring to Lucene
> --------------------------
>
>                 Key: LUCENE-2091
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2091
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/*
>            Reporter: Yuval Feinstein
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, 
> persianlucene.jpg
>
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of 
> Okapi-BM25 scoring in the Lucene framework,
> as an alternative to the standard Lucene scoring (which is a version of mixed 
> boolean/TFIDF).
> I have refactored this a bit, added unit tests and improved the runtime 
> somewhat.
> I would like to contribute the code to Lucene under contrib. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to