The BitSet basically counts how many documents have one or more values in this field. Some docs might not have values in this field. state.segmentInfo.getDocCount() is the # of docs in this segment but we are flushing a single field here. We pass down the cardinality here since we keep the statistics of the doc count per field in the index since 4.0 so we can't use the segmetns doc count.
hope that helps simon On Wed, Mar 20, 2013 at 1:12 PM, Ravikumar Govindarajan <ravikumar.govindara...@gmail.com> wrote: > This is an internal code I came across in lucene today and unable to > decipher it. > > FreqProxTermsWriterPerField.java > > void flush(String fieldName, FieldsConsumer consumer, final > SegmentWriteState state) > { > ............. > FixedBitSet visitedDocs = new FixedBitSet(state.segmentInfo.getDocCount()); > for (int i = 0; i < numTerms; i++) > { > ............. > visitedDocs.set(docID); > ......... > termsConsumer.finishTerm(text, new TermStats(docFreq, writeTermFreq ? > totTF : -1)); *//We plan to pass the state.segmentInfo.getDocCount() in > TermStats, above. Is it * > * wrong to do this here?* > } > //Once all terms are over > termsConsumer.finish(writeTermFreq ? sumTotalTermFreq : -1, sumDocFreq, > visitedDocs.cardinality()); *//Why are we doing cardinality() instead of > getDocCount() here?* > *//Can there be un-visited docs during a flush?* > } > * > * > Can someone help me understand this? > > -- > Ravi --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org