Re: Index Replication / Clustering

Nader Henein Sun, 26 Jun 2005 01:17:17 -0700

Our setup is quite similar to yours, but in all honesty, you will needto do some for of batching on your updates simply because, you don'twant to keep the Index Writter open all the time.

As for clustering, we went through three iterations, that keep x indexesparallelized on x servers all of this with fail over and indexindependent synchronization with your persistent store. There was alittle discussion about this a few weeks back, and I mentioned that yourbiggest pain will be maintaining the integrity of parallel indexes thatare updated/deleted autonomously (atomic updates and deletes) but thereare ways of running iterative checks to make sure that your indeciesstay clean.


Nader Henein

Stephane Bailliez wrote:

I have been browsing the archives concerning this particular topic.

I'm in the same boat and the customer has clustering requirements.

To give some background:
I have a constant flow of incoming messages flying over the networkthat need to be archived in db, indexed and dispatched to thousand ofclients (rich client console).
the backend architecture needs to be clustered meaning that:
- the message broker needs to be clustered
- the database needs to be replicated and support failover
- the search engine index needs to be replicated

This is for a 24x7 operation.
My main problem is that there is a constant flow of write just abouteverywhere meaning that the lucene index keeps changing, and that Ihave a very small window available to replicate the data across thenetwork.(As of now, I have 2 messages / minute and should go over 50 in themedium-term).
Concerning the index, being able to replicate is cool, but if one nodegoes down, it must be able to resynchronize when you bring it up onthe cluster...that's a hell of problem.
As it is acceptable to have downtime on the search engine, I wasthinking it was much easier to:
1) rely on a shared index via NFS for each node.
2) dedicate a box to the search engine and access it via rpc from eachnode
Considering the messages I have seen in the archives, 1) seems to be ano-go.
Option 2) is generally not recommended but think it could fit my needsquite well. IMHO it should work quite well to bring the box inoperation if it goes down. Synchronizing the index for me is just amatter of going through the database to reindex the archived content,this will take sometime but as I said, running in degraded mode isacceptable.
As anyone any suggestion/recommendation/experience/thoughts concerningthe problems mentionned above ?
Cheers,

Stephane





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---


--

Nader S. Henein
Senior Applications Architect

Bayt.com

---


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Index Replication / Clustering

Reply via email to