Go back and put it in after you have all the documents for that commit
point. Or on reader load, calculate it.
- Mark
http://www.lucidimagination.com (mobile)
On Nov 20, 2009, at 7:56 PM, Jake Mannix <jake.man...@gmail.com> wrote:
On Fri, Nov 20, 2009 at 4:51 PM, Mark Miller <markrmil...@gmail.com>
wrote:
Okay - my fault - I'm not really talking in terms of Lucene. Though
even
there I consider it possible. You'd just have to like, rewrite it :)
And
it would likely be pretty slow.
Rewrite it how? When you index the very first document, the docFreq
of all
terms is 1, out of numDocs = 1 docs in the corpus. Everybody's idf
is the same.
No matter how you normalize this, it'll be wrong, once you've
indexed a million
documents. This isn't a matter of Lucene architecture, it's a
matter of idf being
a query-time exactly available value (you can approximate it partway
through
indexing, but you don't know it at all when you start).
-jake