[ https://issues.apache.org/jira/browse/LUCENE-2091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005688#comment-13005688 ]
Robert Muir commented on LUCENE-2091: ------------------------------------- {quote} your attachment (BM25SimilarityProvider) seems to rely on some other code (Stats.DocFieldStats) & AggregatesProvider .. which I guess is part of your DFR patch.. can you provide a pointer to that. {quote} Yeah this is from LUCENE-2392. Unfortunately it won't work with the most recent patch there, but both patches are just really exploration to see how we can divide into subtasks. For an update, the JIRA issues aren't well linked but we have actually made pretty good progress on some major portions (imo these are the most interesting): * Collection term stats: LUCENE-2862 * per-field similarity: LUCENE-2236 * termstate, to avoid redundant i/o for stats: LUCENE-2694 * norms cleanup: LUCENE-2771, LUCENE-2846 The next big step is to separate scoring from matching (see the latest patch on LUCENE-2392) so that similarity has full responsibility for all calculations, and so we get full integration with all queries, etc. This isn't that complicated: however, in order to do this, we need to first refactor Explanations, so that a Similarity has the capability (and responsibility!) to fully explain its calculations. So I think this is the next issue to resolve before going any further. > Add BM25 Scoring to Lucene > -------------------------- > > Key: LUCENE-2091 > URL: https://issues.apache.org/jira/browse/LUCENE-2091 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/* > Reporter: Yuval Feinstein > Priority: Minor > Fix For: 4.0 > > Attachments: BM25SimilarityProvider.java, LUCENE-2091.patch, > persianlucene.jpg > > Original Estimate: 48h > Remaining Estimate: 48h > > http://nlp.uned.es/~jperezi/Lucene-BM25/ describes an implementation of > Okapi-BM25 scoring in the Lucene framework, > as an alternative to the standard Lucene scoring (which is a version of mixed > boolean/TFIDF). > I have refactored this a bit, added unit tests and improved the runtime > somewhat. > I would like to contribute the code to Lucene under contrib. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org