Hi all!
I'm reimplementing a very Lucene-like search library as a learning
experience and I've run into a snag. Before I go deep code diving, I
thought I'd ask here in case someone has the time to answer.
The term dictionary file includes the term count in a header. But when
I'm merging segments, I can't know the collected number of UNIQUE terms
in the merging segments before I've read them, so I can't write the
header before I start merging the segments.
The ways I can see to do this are (a) to scan the term lists of the
segments first and build the collected term list in memory before
merging, (b) leave space in the file for the term count and go back and
overwrite it later, or (c) something much more clever that Lucene does
but I haven't figured out yet.
(b) is undesirable for me, because I'd like the option of using
compressed streams in the backend, which must be written serially.
Anyway, if someone more familiar with the code could point me in the
right direction, I'd appreciate it very much.
Thanks!
Matt
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]