However, I'm a bit stuck at the moment.
The "document code" is written by writePostings in DocumentWriter:
int f = posting.freq; if (f == 1) // optimize freq=1 freq.writeVInt(1); // set low bit of doc num. else { freq.writeVInt(0); // the document number freq.writeVInt(f); // frequency in doc }
So that integer with the low bit filed off is *always* going to be zero. Which means that the returned set of documents is always going to have the IDs set to zero, which is precisely what's happening in my Perl port. But I'd rather like it to have the right document ID, which is 1.
DocumentWriter optimizes a particular case, where the document number is always zero. The general case is in SegmentMerger.java:
int docCode = (doc - lastDoc) << 1; // use low bit to flag freq=1 lastDoc = doc;
int freq = postings.freq(); if (freq == 1) { freqOutput.writeVInt(docCode | 1); // write doc & freq=1 } else { freqOutput.writeVInt(docCode); // write doc freqOutput.writeVInt(freq); // write frequency in doc }
I hope this makes more sense.
Doug
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]