On 29/03/2012 11:14, Andrzej Bialecki wrote:

The problem in our implementation is that we use a within-document term
frequency (the number of occurrences of t in the current document) and
not a collection-wide term frequency... so, it looks to me that the fix
would be to first fully traverse the doc enumeration and calculate the
total number of term occurrences in all documents (e.g. in
RIDFTermPruningPolicy.initPositionsTerm(..) ), and use this value in the
formula in place of termPositions.freq().


This is the fix that I implemented, it's now committed to branch_3x and will be included in release 3.6.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to