Confused by writePostings/SegmentTermDocs.next()

Simon Cozens Tue, 02 Dec 2003 09:05:36 -0800

Hi all,
    At my company, we're working on a Perl version of Lucene, which we plan to
release under the same terms as Lucene. (When we have it working, tested and
documented.)
    However, I'm a bit stuck at the moment.


    I've got index writing and reading working, and am doing a search on a
Term using TermQuery and TermScorer. As I understand it, the TermScorer works
by reading the TermDocs postings. It gets the document ID by this code in
SegmentTermDocs.next():

      int docCode = freqStream.readVInt();
      doc += docCode >>> 1;                       // shift off low bit

The "document code" is written by writePostings in DocumentWriter:

        int f = posting.freq;
        if (f == 1)                               // optimize freq=1
          freq.writeVInt(1);                      // set low bit of doc num.
        else {
          freq.writeVInt(0);                      // the document number
          freq.writeVInt(f);                      // frequency in doc
        }

So that integer with the low bit filed off is *always* going to be zero.
Which means that the returned set of documents is always going to have the
IDs set to zero, which is precisely what's happening in my Perl port. But
I'd rather like it to have the right document ID, which is 1.

Obviously Lucene works, and my understanding is broken somewhere, but I
can't see where. Can someone please shed some light on what's going on here?

Thanks,
Simon
-- 
<gnat> TorgoX: you're rapidly learning, I see, that XML is a fucking
piece of festering shit which has no more justification for walking
God's clean earth than a dung beetle with diarrhoea.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Confused by writePostings/SegmentTermDocs.next()

Reply via email to