Re: Realtime Search

Michael McCandless Fri, 09 Jan 2009 06:00:26 -0800

Marvin Humphrey <mar...@rectangular.com> wrote:

> The goal is to improve worst-case write performance.
> ...
> In between the time when the background merge writer starts up and the time 
> it finishes consolidating segment data, we assume that the primary writer 
> will have modified the index.
>
>   * New docs have been added in new segments.
>   * Tombstones have been added which suppress documents in segments which 
> didn't even exist when the background merge writer started up.
>   * Tombstones have been added which suppress documents in segments which 
> existed when the background merge writer started up, but were not merged.
>   * Tombstones have been added which suppress documents in segments which 
> have just been merged.
>
> Only the last category of deletions matters.
>
> At this point, the background merge writer aquires an exclusive write lock on 
> the index. It examines recently added tombstones, translates the document 
> numbers and writes a tombstone file against itself. Then it writes the 
> snapshot file to commit its changes and releases the write lock.


OK, now I understand KS's two-writer model.  Lucene has already solved
this with the ConcurrentMergeScheduler -- all segment merges are done
in the BG (by default).

We also have to compute the deletions against the new segment to
include deletions that happened to the merged segments after the merge
kicked off.

Still, it's not a panacea since often the IO system has horrible
degradation in performance while a merge is running.  If only we could
mark all IO (reads & writes) associated with merging as low priority
and have the OS actually do the right thing...

> It's true that we are decoupling the process of making logical changes to the 
> index from the process of internal consolidation. I probably wouldn't 
> describe that as being done from the reader's standpoint, though.

Right, we have a different problem in Lucene (because we must warm a
reader before using it): after a large merge, warming the new
IndexReader that includes that segment can be costly (though that cost
is going down with LUCENE-1483, and eventually column-stride fields).

But we can solve this by allowing a reopened reader to use the old
segments, until the new segment is warmed.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Realtime Search

Reply via email to