Re: Index Replication / Clustering

Nader Henein Mon, 27 Jun 2005 04:40:38 -0700

Considerations that you may want to think about when sanitizing yourclustered indecies:

1) Number of documents available vs. number of documents in thepersistent store.2) Are all the document up to date (involves comparing the existence andthe last date updated of Lucene documents to persistent store)3) Have all the documents that should have been deleted actually beendeleted from the index, if you delete the documents from the persistentstore this is not trivial, we're using an "is_deleted" flag so we canquery the index for deleted documents, if we get any results, thenthere's a problem.

Sync issues will happen, welcome to the wonderful work of NDC, not byany fault of programming, but simply because this is the nature ofnetwork communication, and computing in demanding environments. Failingto provide for contingency and fail safes will give you some of the mostintense headaches, ultimately, you want the system to fix itself (assumefailures for day one) I've been monitoring the system from a distancefor the past year with little or no interference (4 years in total),there is nothing better than knowing that I could take a tire iron to afew of our machines and that would not affect the site performance, Iget to sleep at night.


My two cents.

Nader Henein



Paul Smith wrote:

If you use ActiveMQ for JMS, you can take advantage of it'sComposite Destination feature and have a virtual Queue/Topic thatis actually several Queues/Topics. This is what we use to keep amirror index server completely in sync. The application sends anupdate message to a queue named "queue://index1, queue://index2",which becomes 2 separate queues for the 2 servers, allowing themto process the same message whenever they can get around to it.
Ah, the composite topic, is indeed a good nice. But out ofcuriosity...did you put your 2 nodes (consumers) as embedded brokersor is the producer as the main broker ?
Neither in our case, a central broker (albeit we plan to have abackup plan with ActiveMQ installed on all servers with the sameconfig ready to roll, using zeroconf for discovery).
We then place Apache in front of these 2 mirrored Index/Searchnodes so the application can use web-services to query the searchnode without actually being aware that there is 2 of them behindthe scenes, leaving Apache to do the load-balancing and fail-overas the index/search nodes come up/down without the mainapplication knowing anything about it.
Ideally, the 2 nodes have the same state when running.
Ideally, reality is different. We're going to be monitoringconsistency of the pair closely, and should the fall out of sync withregularity, then one of them is just going to be the hot-spare forfailure purposes only.
What happens when a node fails and that you put it back online andthat it needs to catch up with all missing messages in its queue ?Is it considered 'offline' until it catches up ? If yes how do youdo it ? If no, I guess you don't mind that a search request may notgive the same result depending on the node it is load-balanced,correct ?
In this case we will manually mark the node via Apache worker configsto be be disabled until it has caught up.
Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---


--

Nader S. Henein
Senior Applications Architect

Bayt.com

---


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Index Replication / Clustering

Reply via email to