On 9/6/06, Marvin Humphrey <[EMAIL PROTECTED]> wrote:

That's one way of thinking about it.  There's only one "thing"
though: a big bucket of serialized index entries.  At the end of a
session, those are sorted, pulled apart, and used to write the tis,
tii, frq, and prx files.

Interesting.

When do you add "merge-worthy" segments? I'd guess at the end of a
session, when it's easy to decide which segments are "merge-worthy".
If so, however, a newer doc could get a smaller docid than an older
doc, right? It's a nice property of Lucene that an older doc always
has a smaller docid. I think some applications use this to decide
newer/older versions of a document.

In theory, you could apply this technique only to a limited number of
docs and create segments, say, 10 docs at a time rather than 1 at a
time.  But then you still have to do something with each 10 doc
segment, and you don't get the benefits of less disk shuffling and
lower RAM usage.  Better to just create 1 segment per session.

This means no new documents are visible to IndexReader until a session
is over. In some sense, "1 segment/commit per session" lets an
application decide when a "merge" happens.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to