Writing out the term count when merging

Matt Chaput Mon, 19 Mar 2007 15:14:01 -0800

Hi all!

I'm reimplementing a very Lucene-like search library as a learningexperience and I've run into a snag. Before I go deep code diving, Ithought I'd ask here in case someone has the time to answer.

The term dictionary file includes the term count in a header. But whenI'm merging segments, I can't know the collected number of UNIQUE termsin the merging segments before I've read them, so I can't write theheader before I start merging the segments.

The ways I can see to do this are (a) to scan the term lists of thesegments first and build the collected term list in memory beforemerging, (b) leave space in the file for the term count and go back andoverwrite it later, or (c) something much more clever that Lucene doesbut I haven't figured out yet.

(b) is undesirable for me, because I'd like the option of usingcompressed streams in the backend, which must be written serially.

Anyway, if someone more familiar with the code could point me in theright direction, I'd appreciate it very much.


Thanks!

Matt



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Writing out the term count when merging

Reply via email to