Marvin Humphrey wrote:

Lucene also has a blip, but it's different because Lucene will still accept added/deleted documents; but, one cannot reopen a new realtime (LUCENE-1516)
reader during the blip.

The consolidator process has to block while carrying forward deletes, because
otherwise new deletions may get dropped.

If seg_2 is getting merged away and a new writer adds deletions against seg_2 that the consolidator never sees, then once the consolidator finishes, those deletes will vanish without a trace and the "deleted" document will suddenly
reappear in the newly consolidated segment.

Right. I guess it's because Lucene buffers up deletes that it can continue to accept adds & deletes even during the blip. But it cannot write a new
segment (materialize the adds & deletes) during the blip.

Hang on: does your writer process hold onto the write lock the whole
time it's open? Or it only grabs it when it needs to commit a change?

The consolidator grabs consolidate.lock as soon as it launches. While it's working in the background (so to speak), write process continually grab and release write.lock. At the very end of the consolidation process, the consolidator grabs write.lock so that it can carry forward recent deletions --
but hopefully that doesn't take very long.

OK.  Does this mean you can run multiple writers against the same index,
to gain concurrency? (Though... that's tricky, with deletes; oh maybe because you store new deletes for an old segment along with the new segment that's
OK?  Hmm, it still seems like you'd have a staleness problem).

Unfortuntely, we have an annoying IPC issue to deal with. (Lucene wouldn't have this problem.) When it's time for the consolidator to grab write.lock, it will try to obtain it once per second for X seconds, sleeping in between. But if index mods are flying fast and furious, write processes may continually
cut in front and the consolidator may have difficulty obtaining the
write.lock.

We'd like to be able to signal the waiting consolidator process when a write process finishes up so that it can try for write.lock right away, but AFAIK there's no portable way to communicate that from one process to another.
Probably the only workaround is to add yet another lock file, e.g.
consolidator_is_waiting.lock, that blocks further write processes. Yuck. We may also want to have the consolidator try more often than once per second.


Ugh, lock starvation.  Really the OS should provide a FIFO lock queue of
some sort.

Mike

Reply via email to