Re: delete entries from posting list Lucene 4.0

Andrzej Bialecki Mon, 02 Apr 2012 10:03:42 -0700

On 29/03/2012 11:14, Andrzej Bialecki wrote:

The problem in our implementation is that we use a within-document term
frequency (the number of occurrences of t in the current document) and
not a collection-wide term frequency... so, it looks to me that the fix
would be to first fully traverse the doc enumeration and calculate the
total number of term occurrences in all documents (e.g. in
RIDFTermPruningPolicy.initPositionsTerm(..) ), and use this value in the
formula in place of termPositions.freq().

This is the fix that I implemented, it's now committed to branch_3x andwill be included in release 3.6.


--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: delete entries from posting list Lucene 4.0

Reply via email to