> Zoie keeps track of an "index version" on disk alongside the Lucene index > which it uses to decide where it must reindex from to "catch up" if it there > have been incoming indexing events while the server was out of commission.
This begs a little more clarity... Sounds like a transaction log. Oh right, with Zoie there's the assumption of an external transaction log however it doesn't provide one out of the box? On Tue, Jan 12, 2010 at 8:43 PM, Jake Mannix <jake.man...@gmail.com> wrote: > On Tue, Jan 12, 2010 at 8:15 PM, Otis Gospodnetic > <otis_gospodne...@yahoo.com> wrote: >> >> John, you should have a look at Zoie. I just finished adding LinkedIn's >> case study about Zoie to Lucene in Action 2, so this is fresh in my mind. >> >> :) > > Yep, Zoie ( http://zoie.googlecode.com ) will handle the server restart > part, in that while yes, you lose what is in RAM, Zoie keeps track of an > "index version" on disk alongside the Lucene index which it uses to decide > where it must reindex from to "catch up" if it there have been incoming > indexing events while the server was out of commission. > Zoie does not support multiple servers using the same index, because each > zoie instance has IndexWriter instances, and you'll get locking problems > trying to do that. You could have one Zoie instance effectively as the > "master/writer/realtime reader", and a bunch of raw Lucene "slaves" which > could read off of that index, but as you say, could not get access to the > RAMDirectory information until it was flushed to disk. > Why do you need a "cluster" of servers hitting the same index? Are they > different applications (with different search logic, so they need to be > different instances), or is it just to try and utilize your hardware > efficiently? If it's for performance reasons, you might find you get better > use of your CPU cores by just sharding your one index into smaller ones, > each having their own Zoie instance, and putting a "broker" on top of them > searching across all and mergesorting the results. Often even this isn't > necessary, because Zoie will be opening the disk-backed IndexReader in > readonly mode, and thus all the synchronized blocks are gone, and one single > Zoie instance will easily saturate your cpu cores by simple multi-threading > by your appserver. > If you really needed to do many different kinds of writes (from different > applications) and also have applications not involved in the writing also > seeing (in real-time) these writes, then you could still do it with Zoie, > but it would take some interesting architectural juggling (write your own > StreamDataProvider class which takes input from a variety of sources and > merges them together to feed to one Zoie instance, then a broker on top of > zoie which serves out IndexReaders to different applications living on top > which can wrap them up in their own business logic as they saw fit... as > long as it was ok to have all the applications in the same JVM, of course). > -jake > >> >> Otis >> -- >> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch >> >> >> >> ----- Original Message ---- >> > From: jchang <jchangkihat...@gmail.com> >> > To: java-dev@lucene.apache.org >> > Sent: Tue, January 12, 2010 6:10:56 PM >> > Subject: Lucene 2.9.0 Near Real Time Indexing and Service >> > Crashes/restarts >> > >> > >> > Lucene 2.9.0 has near real time indexing, writing to a RAMDir which gets >> > flushed to disk when you do a search. >> > >> > Does anybody know how this works out with service restarts (both orderly >> > shutdown and a crash)? If the service goes down while indexed items are >> > in >> > RAMDir but not on disk, are they lost? Or is there some kind of log >> > recovery? >> > >> > Also, does anybody know the impact of this which clustered lucene >> > servers? >> > If you have numerous servers running off one index, I assume there is no >> > way >> > for the other services to pick up the newly indexed items until they are >> > flushed to disk, correct? I'd be happy if that is not so, but I suspect >> > it >> > is so. >> > >> > Thanks, >> > John >> > -- >> > View this message in context: >> > >> > http://old.nabble.com/Lucene-2.9.0-Near-Real-Time-Indexing-and-Service-Crashes-restarts-tp27136539p27136539.html >> > Sent from the Lucene - Java Developer mailing list archive at >> > Nabble.com. >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> > For additional commands, e-mail: java-dev-h...@lucene.apache.org >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-dev-h...@lucene.apache.org >> > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org