Hi all, I'd like to do a very simple change to the idf computation, but I can't seem to wrap my head around it.
There are very useful hints in the javadocs for "Changing Similarity" for new tf() and lengthNorm() behavior, but it was a little bit blurrier for idf() http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/package-summary.html#changingSimilarity I'd like to use something beyond the global numDocs. I'd like to have a modified idf() that gives me the inverse frequency in a *subset* of the index (e.g. for a specific type of document). I have the type stored in a field, and I'd need to count how many documents contain that type for a given term. Since IDF takes the numDocs as a parameter, I could just change the class that calls idf() and pass the number I need? Who class calls idf()? TermQuery? So should I make the changes there? Or in TermScorer? Anybody has some light to shed on this issue? Thanks in advance, Pablo [1] http://lucene.apache.org/java/3_0_2/api/all/org/apache/lucene/search/DefaultSimilarity.html#idf%28int,%20int%29