On Tue, Jan 12, 2010 at 8:15 PM, Otis Gospodnetic <
otis_gospodne...@yahoo.com> wrote:

> John, you should have a look at Zoie.  I just finished adding LinkedIn's
> case study about Zoie to Lucene in Action 2, so this is fresh in my mind.

:)
>

Yep, Zoie ( http://zoie.googlecode.com ) will handle the server restart
part, in that while yes, you lose what is in RAM, Zoie keeps track of an
"index version" on disk alongside the Lucene index which it uses to decide
where it must reindex from to "catch up" if it there have been incoming
indexing events while the server was out of commission.

Zoie does not support multiple servers using the same index, because each
zoie instance has IndexWriter instances, and you'll get locking problems
trying to do that.  You could have one Zoie instance effectively as the
"master/writer/realtime reader", and a bunch of raw Lucene "slaves" which
could read off of that index, but as you say, could not get access to the
RAMDirectory information until it was flushed to disk.

Why do you need a "cluster" of servers hitting the same index?  Are they
different applications (with different search logic, so they need to be
different instances), or is it just to try and utilize your hardware
efficiently?  If it's for performance reasons, you might find you get better
use of your CPU cores by just sharding your one index into smaller ones,
each having their own Zoie instance, and putting a "broker" on top of them
searching across all and mergesorting the results.  Often even this isn't
necessary, because Zoie will be opening the disk-backed IndexReader in
readonly mode, and thus all the synchronized blocks are gone, and one single
Zoie instance will easily saturate your cpu cores by simple multi-threading
by your appserver.

If you really needed to do many different kinds of writes (from different
applications) and also have applications not involved in the writing also
seeing (in real-time) these writes, then you could still do it with Zoie,
but it would take some interesting architectural juggling (write your own
StreamDataProvider class which takes input from a variety of sources and
merges them together to feed to one Zoie instance, then a broker on top of
zoie which serves out IndexReaders to different applications living on top
which can wrap them up in their own business logic as they saw fit... as
long as it was ok to have all the applications in the same JVM, of course).

  -jake


>
>  Otis
> --
> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>
>
>
> ----- Original Message ----
> > From: jchang <jchangkihat...@gmail.com>
> > To: java-dev@lucene.apache.org
> > Sent: Tue, January 12, 2010 6:10:56 PM
> > Subject: Lucene 2.9.0 Near Real Time Indexing and Service
> Crashes/restarts
> >
> >
> > Lucene 2.9.0 has near real time indexing, writing to a RAMDir which gets
> > flushed to disk when you do a search.
> >
> > Does anybody know how this works out with service restarts (both orderly
> > shutdown and a crash)?  If the service goes down while indexed items are
> in
> > RAMDir but not on disk, are they lost?  Or is there some kind of log
> > recovery?
> >
> > Also, does anybody know the impact of this which clustered lucene
> servers?
> > If you have numerous servers running off one index, I assume there is no
> way
> > for the other services to pick up the newly indexed items until they are
> > flushed to disk, correct?  I'd be happy if that is not so, but I suspect
> it
> > is so.
> >
> > Thanks,
> > John
> > --
> > View this message in context:
> >
> http://old.nabble.com/Lucene-2.9.0-Near-Real-Time-Indexing-and-Service-Crashes-restarts-tp27136539p27136539.html
> > Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>
>

Reply via email to