Re: what happens when new documents written to old index file?

Ben West Mon, 09 Apr 2012 08:43:06 -0700

Nicholas,

Lucene will almost never rewrite the entire index - it just sticks your change 
onto the end (so, unless you trigger a merge, writes are roughly constant-time).


When the segments reach some certain size, Lucene merges them.

Mike McCandless has a very cool visualization of 
this: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html.
 If you are interested in the theory, you might want to search for 
"Log-Structured Merge Trees."

Hope this helps,
-Ben

PS: LSM trees were created as a response to B+ trees not being able to handle 
large amounts of updates. So if that is your comparison point, Lucene probably 
does quite well :-)




----- Original Message -----
From: Nicholas Petersen <npeterse...@gmail.com>
To: lucene-net-user@lucene.apache.org
Cc: 
Sent: Sunday, April 8, 2012 11:04 PM
Subject: what happens when new documents written to old index file?

Hello all,

It's been a while since I've actively gotten my hands involved in Lucene,
but I follow it on these email message boards all the time. Anyway, I'm
getting rusty on my concepts of some points, and here is one particular
question I've been wondering:

When you have already indexed let's say 10,000 documents, and
saved/committed it to file, let's say later you open it up and want to add
one simple document to the index. What I want to know is: how intensive (IO
wise, working on the index file structures) and how performant is the
process of adding one simple document to the already written indexes
(files)? Compared, for instance, to how well a B+ Tree can handle adding
another item to it (ideally very well, with very little of the B+Tree file
bytes having to be touched or reshuffled)?

Worst case scenario is: the whole inverted index structures have to be
re-written.

Was it that a new add like this essentially writes another set of indexes
(with something like an appended iterated number on the files "*_2.*")? Or
is it a lot more capable than all this?

Thanks

Re: what happens when new documents written to old index file?

Reply via email to