[ https://issues.apache.org/jira/browse/LUCENE-1612?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12702692#action_12702692 ]
John Wang commented on LUCENE-1612: ----------------------------------- Excellent point Michael! What do you suggest on how to move forward with this? > expose lastDocId in the posting from the TermEnum API > ----------------------------------------------------- > > Key: LUCENE-1612 > URL: https://issues.apache.org/jira/browse/LUCENE-1612 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: 2.4 > Reporter: John Wang > Attachments: lucene-1612-patch.txt > > > We currently have on the TermEnum api: docFreq() which gives the number docs > in the posting. > It would be good to also have the max docid in the posting. That information > is useful when construction a custom DocIdSet, .e.g determine sparseness of > the doc list to decide whether or not to use a BitSet. > I have written a patch to do this, the problem with it is the TermInfosWriter > encodes values in VInt/VLong, there is very little flexibility to add in > lastDocId while making the index backward compatible. (If simple int is used > for say, docFreq, a bit can be used to flag reading of a new piece of > information) > output.writeVInt(ti.docFreq); // write doc freq > output.writeVLong(ti.freqPointer - lastTi.freqPointer); // write pointers > output.writeVLong(ti.proxPointer - lastTi.proxPointer); > Anyway, patch is attached with:TestSegmentTermEnum modified to test this. > TestBackwardsCompatibility fails due to reasons described above. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org