On 29/03/2012 11:14, Andrzej Bialecki wrote:
The problem in our implementation is that we use a within-document term frequency (the number of occurrences of t in the current document) and not a collection-wide term frequency... so, it looks to me that the fix would be to first fully traverse the doc enumeration and calculate the total number of term occurrences in all documents (e.g. in RIDFTermPruningPolicy.initPositionsTerm(..) ), and use this value in the formula in place of termPositions.freq().
This is the fix that I implemented, it's now committed to branch_3x and will be included in release 3.6.
-- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org