Solr just copies them into the same directory - Lucene files are write once, so its not much different than what happens locally.
Nigel wrote: > Right now we logically re-open an index by making an updated copy of the > index in a new directory (using rsync etc.), opening the new copy, and > closing the old one. We don't use IndexReader.reopen() because the updated > index is in a different directory (as opposed to being updated in-place). > > (Reading about some of the 2.9 changes motivated me to look into actually > using reopen(). And Michael Busch and Mark Miller both pointed out that I > was incorrect in saying that pre-2.9 reopen() wasn't more efficient than > just opening a new index -- I've read through that code now so I have at > least a basic understanding of what's happening there. Anyway, it seems > like reopen() is a Good Thing, so I'd like to use it. (-:) > > So, my real question was whether there is a "recommended" way to update an > index in-place with files copied from a separate indexing server. > > For example, do you simply rsync in the new cfs files, overwrite the > segments.gen and segments_XX files, and call reopen()? Or create an updated > copy in a new directory, then rename new directory to the old name once > you're sure you've copied everything successfully, then call reopen()? What > does Solr do? > > Thanks, > Chris > > On Mon, Oct 5, 2009 at 8:39 PM, Jason Rutherglen <jason.rutherg...@gmail.com > >> wrote: >> > > >> I'm not sure I understand the question. You're trying to reopen >> the segments that you're replicated and you're wondering what's >> changed in Lucene? >> >> On Mon, Oct 5, 2009 at 5:30 PM, Nigel <nigelspl...@gmail.com> wrote: >> >>> Anyone have any ideas here? I imagine a lot of other people will have a >>> similar question when trying to take advantage of the reopen improvements >>> >> in >> >>> 2.9. >>> >>> Thanks, >>> Chris >>> >>> On Thu, Oct 1, 2009 at 5:15 PM, Nigel <nigelspl...@gmail.com> wrote: >>> >>> >>>> I have a question about the reopen functionality in Lucene 2.9. As I >>>> understand it, since FieldCaches are now per-segment, it can avoid >>>> >> reloading >> >>>> everything when the index is reopened, and instead just load the new >>>> segments. >>>> >>>> For background, like many people we have a distributed architecture >>>> >> where >> >>>> indexes are created on one server and copied to multiple other servers. >>>> >> The >> >>>> way that copying works now is something like the following: >>>> >>>> 1. Let's say the current index is in /indexes/a and is open >>>> 2. An empty directory for the updated index is created, let's say >>>> /indexes/b >>>> 3. Hard links for the files in /indexes/a are created in /indexes/b >>>> 4. We rsync the current index on the server with /indexes/b, thus >>>> copying over new cfs files and deleting hard links to files no longer >>>> >> in use >> >>>> 5. A new IndexReader is opened for /indexes/b and warmed up >>>> 6. The application starts using the new reader instead of the old one >>>> 7. The old IndexReader is closed and /indexes/a is deleted >>>> >>>> I'm simplifying a few steps, but I think this is familiar to many >>>> >> people, >> >>>> and it's my impression that Solr implements something similar. >>>> >>>> The point is, the updated index lives in a new directory in this scheme, >>>> and so we don't actually reopen the existing IndexReader; we open a new >>>> >> one >> >>>> with a different FSDirectory. >>>> >>>> Before Lucene 2.9, I don't think this made any difference, as (I think) >>>> >> the >> >>>> only advantage to calling reopen vs. just creating another IndexReader >>>> >> was >> >>>> having reopen figure out whether the index had actually changed. (And >>>> >> whave >> >>>> a different way to figure that out, so it was a non-issue.) >>>> >>>> With Lucene 2.9, there's now a big difference, namely the per-segment >>>> caching mentioned above. So the question is how to make use of reopen >>>> >> with >> >>>> our distribution scheme. Is there an informal best practice for >>>> >> handling >> >>>> this case? For example, should step #5 above rename /indexes/b to >>>> /indexes/a so the index can be reopened in the same physical location? >>>> >> Or >> >>>> should rsync operate on the existing directory in-place, updating the >>>> segments* files last and relying on the fact that deleted files will not >>>> really be deleted (on Linux, at least) as long as the app is still >>>> >> holding >> >>>> them open? >>>> >>>> I guess the answer may depend on how exactly reopen knows which files >>>> >> are >> >>>> the "same" (e.g. does it look at filenames, or file descriptors, etc.). >>>> >>>> Thanks, >>>> Chris >>>> >>>> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> > > -- - Mark http://www.lucidimagination.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org