Re: Index Replication / Clustering

Paul Smith Sun, 26 Jun 2005 15:21:39 -0700

Why not try using JMS messaging to send messages to the indexingserver that Document X needs to be updated via a JMS queue? Thisgives you the flexibility to have the indexing system down but notlose the message that it needs to be indexed, and also allows theindexing server to be 'busy' without affecting the application thatis performing the updates from slowing down too.

If you use ActiveMQ for JMS, you can take advantage of it's CompositeDestination feature and have a virtual Queue/Topic that is actuallyseveral Queues/Topics. This is what we use to keep a mirror indexserver completely in sync. The application sends an update messageto a queue named "queue://index1, queue://index2", which becomes 2separate queues for the 2 servers, allowing them to process the samemessage whenever they can get around to it.

We then place Apache in front of these 2 mirrored Index/Search nodesso the application can use web-services to query the search nodewithout actually being aware that there is 2 of them behind thescenes, leaving Apache to do the load-balancing and fail-over as theindex/search nodes come up/down without the main application knowinganything about it.


Paul Smith

On 26/06/2005, at 2:35 AM, Stephane Bailliez wrote:

I have been browsing the archives concerning this particular topic.

I'm in the same boat and the customer has clustering requirements.

To give some background:
I have a constant flow of incoming messages flying over the networkthat need to be archived in db, indexed and dispatched to thousandof clients (rich client console).
the backend architecture needs to be clustered meaning that:
- the message broker needs to be clustered
- the database needs to be replicated and support failover
- the search engine index needs to be replicated

This is for a 24x7 operation.
My main problem is that there is a constant flow of write justabout everywhere meaning that the lucene index keeps changing, andthat I have a very small window available to replicate the dataacross the network.(As of now, I have 2 messages / minute and should go over 50 in themedium-term).
Concerning the index, being able to replicate is cool, but if onenode goes down, it must be able to resynchronize when you bring itup on the cluster...that's a hell of problem.
As it is acceptable to have downtime on the search engine, I wasthinking it was much easier to:
1) rely on a shared index via NFS for each node.
2) dedicate a box to the search engine and access it via rpc fromeach node
Considering the messages I have seen in the archives, 1) seems tobe a no-go.
Option 2) is generally not recommended but think it could fit myneeds quite well. IMHO it should work quite well to bring the boxin operation if it goes down. Synchronizing the index for me isjust a matter of going through the database to reindex the archivedcontent, this will take sometime but as I said, running in degradedmode is acceptable.
As anyone any suggestion/recommendation/experience/thoughtsconcerning the problems mentionned above ?
Cheers,

Stephane





---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Index Replication / Clustering

Reply via email to