Re: Lucene 2.9.0 Near Real Time Indexing and Service Crashes/restarts

Jason Rutherglen Tue, 12 Jan 2010 20:56:11 -0800

> Zoie keeps track of an "index version" on disk alongside the Lucene index 
> which it uses to decide where it must reindex from to "catch up" if it there 
> have been incoming indexing events while the server was out of commission.


This begs a little more clarity... Sounds like a transaction log.  Oh
right, with Zoie there's the assumption of an external transaction log
however it doesn't provide one out of the box?

On Tue, Jan 12, 2010 at 8:43 PM, Jake Mannix <jake.man...@gmail.com> wrote:
> On Tue, Jan 12, 2010 at 8:15 PM, Otis Gospodnetic
> <otis_gospodne...@yahoo.com> wrote:
>>
>> John, you should have a look at Zoie.  I just finished adding LinkedIn's
>> case study about Zoie to Lucene in Action 2, so this is fresh in my mind.
>>
>> :)
>
> Yep, Zoie ( http://zoie.googlecode.com ) will handle the server restart
> part, in that while yes, you lose what is in RAM, Zoie keeps track of an
> "index version" on disk alongside the Lucene index which it uses to decide
> where it must reindex from to "catch up" if it there have been incoming
> indexing events while the server was out of commission.
> Zoie does not support multiple servers using the same index, because each
> zoie instance has IndexWriter instances, and you'll get locking problems
> trying to do that.  You could have one Zoie instance effectively as the
> "master/writer/realtime reader", and a bunch of raw Lucene "slaves" which
> could read off of that index, but as you say, could not get access to the
> RAMDirectory information until it was flushed to disk.
> Why do you need a "cluster" of servers hitting the same index?  Are they
> different applications (with different search logic, so they need to be
> different instances), or is it just to try and utilize your hardware
> efficiently?  If it's for performance reasons, you might find you get better
> use of your CPU cores by just sharding your one index into smaller ones,
> each having their own Zoie instance, and putting a "broker" on top of them
> searching across all and mergesorting the results.  Often even this isn't
> necessary, because Zoie will be opening the disk-backed IndexReader in
> readonly mode, and thus all the synchronized blocks are gone, and one single
> Zoie instance will easily saturate your cpu cores by simple multi-threading
> by your appserver.
> If you really needed to do many different kinds of writes (from different
> applications) and also have applications not involved in the writing also
> seeing (in real-time) these writes, then you could still do it with Zoie,
> but it would take some interesting architectural juggling (write your own
> StreamDataProvider class which takes input from a variety of sources and
> merges them together to feed to one Zoie instance, then a broker on top of
> zoie which serves out IndexReaders to different applications living on top
> which can wrap them up in their own business logic as they saw fit... as
> long as it was ok to have all the applications in the same JVM, of course).
>   -jake
>
>>
>>  Otis
>> --
>> Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch
>>
>>
>>
>> ----- Original Message ----
>> > From: jchang <jchangkihat...@gmail.com>
>> > To: java-dev@lucene.apache.org
>> > Sent: Tue, January 12, 2010 6:10:56 PM
>> > Subject: Lucene 2.9.0 Near Real Time Indexing and Service
>> > Crashes/restarts
>> >
>> >
>> > Lucene 2.9.0 has near real time indexing, writing to a RAMDir which gets
>> > flushed to disk when you do a search.
>> >
>> > Does anybody know how this works out with service restarts (both orderly
>> > shutdown and a crash)?  If the service goes down while indexed items are
>> > in
>> > RAMDir but not on disk, are they lost?  Or is there some kind of log
>> > recovery?
>> >
>> > Also, does anybody know the impact of this which clustered lucene
>> > servers?
>> > If you have numerous servers running off one index, I assume there is no
>> > way
>> > for the other services to pick up the newly indexed items until they are
>> > flushed to disk, correct?  I'd be happy if that is not so, but I suspect
>> > it
>> > is so.
>> >
>> > Thanks,
>> > John
>> > --
>> > View this message in context:
>> >
>> > http://old.nabble.com/Lucene-2.9.0-Near-Real-Time-Indexing-and-Service-Crashes-restarts-tp27136539p27136539.html
>> > Sent from the Lucene - Java Developer mailing list archive at
>> > Nabble.com.
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> > For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: java-dev-h...@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

Re: Lucene 2.9.0 Near Real Time Indexing and Service Crashes/restarts

Reply via email to