Reestablishing a Solr node that ran on a completely crashed machine

Per Steffensen Tue, 18 Jun 2013 03:04:20 -0700

Hi

Scenario:

* 1) You have a Solr cloud cluster running - several Solr nodes acrossseveral machine - many collections with many replica and documentsindexed into them* 2) One of the machines running a Solr node completely crashes -totally gone including local disk with data/config etc. of the Solr node* 3) You want to be able to insert a new empty machine,install/configure Solr on this new machine, give it the same IP andhostname as the crashed machine had, and then we want to be able tostart this new Solr node and have it take the place of the crashed Solrnode, making the Solr cloud cluster work again* 4) No replication (only one replica per shard), so we will accept thatthe data on the crashed machine is gone forever, but of course we wantthe Solr cloud cluster to continue running with the documents indexed onthe other Solr nodes


At my company we are establishing a procedure for what to do in 3) above.

Basically we use our "install script" to install/configure the new Solrnode on the new machine as it was originally installed/configured on thecrashed machine back when the system was originally set up - thisincludes an "empty" solr.xml file (no cores mentioned). Now starting allthe Solr nodes (including the new reestablished one) again. They allstart successfully but the Solr cloud cluster does not work - at leastwhen doing distributed searches touching replica that used to run on thecrashed Solr node, because those replica in not loaded on thereestablished node.

How to make sure a reestablished Solr node on a machine with same IP andhostname as on the machine that crashed will load all the replica thatthe old Solr node used to run?


Potential solutions

* We have tried to make sure that the solr.xml on the reestablished Solrnode is containing the same core-list as on the crashed one. Theneverything works as we want. But this is a little fragile and it is asolution "outside" Solr - you need to figure out how to reestablish thesolr.xml yourself - probably something like looking intoclusterstate.json and generate the solr.xml from that* Untested by us: Maybe we will also succeed just running Core API LOADoperations against the new reestablished Solr node - a LOAD operationfor each replica that used to run on the Solr node. But this is also alittle fragile and it is also (partly) a solution "outside" Solr - youneed to figure out which cores to load yourself.

I have to say that we do not use the "latest" Solr version - we use aversion of Solr based on 4.0.0. So there might be a solution already inSolr, but I would be surprised.

Any thoughts about how this "ought" to be done? Support in Solr? E.g. an"operation" to tell a Solr node to load all the replica that used to runon a machine with the same IP and hostname? Or...?


Regards, Per Steffensen

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reestablishing a Solr node that ran on a completely crashed machine

Reply via email to