On 9/6/06, Marvin Humphrey <[EMAIL PROTECTED]> wrote:
That's one way of thinking about it. There's only one "thing" though: a big bucket of serialized index entries. At the end of a session, those are sorted, pulled apart, and used to write the tis, tii, frq, and prx files.
Interesting. When do you add "merge-worthy" segments? I'd guess at the end of a session, when it's easy to decide which segments are "merge-worthy". If so, however, a newer doc could get a smaller docid than an older doc, right? It's a nice property of Lucene that an older doc always has a smaller docid. I think some applications use this to decide newer/older versions of a document.
In theory, you could apply this technique only to a limited number of docs and create segments, say, 10 docs at a time rather than 1 at a time. But then you still have to do something with each 10 doc segment, and you don't get the benefits of less disk shuffling and lower RAM usage. Better to just create 1 segment per session.
This means no new documents are visible to IndexReader until a session is over. In some sense, "1 segment/commit per session" lets an application decide when a "merge" happens. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]