On Tue, Jul 17, 2012 at 12:44 PM, Roman Chyla <roman.ch...@gmail.com> wrote: > Hi, > > Tests show that TermEnum.docFreq() returns sum of all docs, including > the deleted ones. Which seems to (indirectly) contradict the javadoc
That's right; fixing it to reflect deleted documents would be prohibitively costly. Hmm which version/javadocs are you looking at? IndexReader.docFreq at least calls out this limitation. > This frequency count is used to compute uninverted index > (DocTermOrds.uninvert()). The code goes like: > > final int df = te.docFreq(); > if (df <= maxTermDocFreq) { > > > So, if I happen to have many deleted documents, and maxTermDocFreq is > low, then the term will be excluded (even if the freq of the livedocs > is OK). Most likely, the cache will be incomplete. > > Can it be considered a feature? Or is it a bug? Maybe we could pro-rate the return docFreq by the pctg of deleted documents? It wouldn't be perfectly correct but on average should have the right effect (keeping RAM consumption down)? Can you open a Jira issue? Thanks. Mike McCandless http://blog.mikemccandless.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org