: But if now the index goes through a massive update, where almost all the : docs containing TC are deleted, and TC is not in any newly added doc, : practically TC becomes rare too, and hence D2 should probably be scored : higher than D1. But IDF(TC) might not (yet) reflect the massive docs : deletion, and the scores are wrongly biased so D1 is still scored higher : than D2.
yeah ... i was only thinking about the numDocs change (which would be the same for idf(TC) and idf(TR)) and forgot that docFreq is ignorant of deletes as well. : I didn't follow the code for that, just thinking IDFs and scoring aloud, so : I hope I am not missing something, but in any case this is just for the : sake of discussion, because in reality you don't expect index statistics to : change that dramatically, ahead of merges. that's really the key issue ot remember ... you might notice this when deleting/re-adding 90% of the docs in an index consisting of only 10 docs, because you'll likely still only have one segment -- but if you do the same thing in an index of 100,000 docs you're going to get some segment merges which will help keep things balanced. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]