Modifying idf()?

Pablo Mendes Fri, 30 Jul 2010 05:00:02 -0700

Hi all,
I'd like to do a very simple change to the idf computation, but I can't seem
to wrap my head around it.


There are very useful hints in the javadocs for "Changing Similarity" for
new tf() and lengthNorm() behavior, but it was a little bit blurrier for
idf()
http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/package-summary.html#changingSimilarity

I'd like to use something beyond the global numDocs.
I'd like to have a modified idf() that gives me the inverse frequency in a
*subset* of the index (e.g. for a specific type of document). I have the
type stored in a field, and I'd need to count how many documents contain
that type for a given term. Since IDF takes the numDocs as a parameter, I
could just change the class that calls idf() and pass the number I need? Who
class calls idf()? TermQuery? So should I make the changes there? Or in
TermScorer?

Anybody has some light to shed on this issue?

Thanks in advance,
Pablo

[1]
http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/DefaultSimilarity.html#idf%28int,%20int%29

Modifying idf()?

Reply via email to