Hi all!

I'm reimplementing a very Lucene-like search library as a learning experience and I've run into a snag. Before I go deep code diving, I thought I'd ask here in case someone has the time to answer.

The term dictionary file includes the term count in a header. But when I'm merging segments, I can't know the collected number of UNIQUE terms in the merging segments before I've read them, so I can't write the header before I start merging the segments.

The ways I can see to do this are (a) to scan the term lists of the segments first and build the collected term list in memory before merging, (b) leave space in the file for the term count and go back and overwrite it later, or (c) something much more clever that Lucene does but I haven't figured out yet.

(b) is undesirable for me, because I'd like the option of using compressed streams in the backend, which must be written serially.

Anyway, if someone more familiar with the code could point me in the right direction, I'd appreciate it very much.

Thanks!

Matt



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to