Thanks for the perspective. So what's the recommendation for when to commit. My use case is adding a stream of docs (approx 50-200 per min). Conceptually, there are no transactions, simply adding new docs with a small percentage of updates).
What approaches are typically used: Commit periodically? Sync commits with merges? Some other heuristic? Any techniques/theories appreciated. Even if they don't fit my scenario. Cheers, On 6 May 2013 03:16, Christopher Currens <[email protected]> wrote: > NOTE: This is mostly from memory, but I think it's correct. > > Lucene's IndexWriter follows transactional writes, so the Segments_N file > isn't updated until Commit is called. In fact, updating the segments files > is the last thing that is done in a commit, since Commit() can throw an OOM > exception. If things were kept in sync during each write, it would no > longer be transactional, and you could end up with bad state in the index > (ie segments files pointing to segments that aren't complete, or didn't > merge properly). > > Technically, there are multiple segments that are written to disk, but not > referenced in the segments file, as you've alluded to, so without the > careful tracking by the index writer, things could get corrupted pretty > quickly if it tried to sync each time, considering the default segment > merge policy has merging done in a background thread...it gets dicey when > and exception is thrown on the background thread, and state can't always be > restored in the index. NRT search isn't really affected by this, because > it's using a reader that's returned from the writer. It has access to all > of the segments that are on disk or haven't been committed yet. >
