Ok, thanks. I think we will just reconstruct solr.xml (from clusterstate.json) ourselves for now.

On 6/18/13 2:15 PM, Mark Miller wrote:
I don't know what the best method to use now is, but the slightly longer term 
plan is to:

* Have a new mode where you cannot preconfigure cores, only use the 
collection's API.
* ZK becomes the cluster state truth.
* The Overseer takes actions to ensure cores live/die in different places based 
on the truth in ZK.

- Mark

On Jun 18, 2013, at 6:03 AM, Per Steffensen <[email protected]> wrote:

Hi

Scenario:
* 1) You have a Solr cloud cluster running - several Solr nodes across several 
machine - many collections with many replica and documents indexed into them
* 2) One of the machines running a Solr node completely crashes - totally gone 
including local disk with data/config etc. of the Solr node
* 3) You want to be able to insert a new empty machine, install/configure Solr 
on this new machine, give it the same IP and hostname as the crashed machine 
had, and then we want to be able to start this new Solr node and have it take 
the place of the crashed Solr node, making the Solr cloud cluster work again
* 4) No replication (only one replica per shard), so we will accept that the 
data on the crashed machine is gone forever, but of course we want the Solr 
cloud cluster to continue running with the documents indexed on the other Solr 
nodes

At my company we are establishing a procedure for what to do in 3) above.

Basically we use our "install script" to install/configure the new Solr node on the new 
machine as it was originally installed/configured on the crashed machine back when the system was 
originally set up - this includes an "empty" solr.xml file (no cores mentioned). Now 
starting all the Solr nodes (including the new reestablished one) again. They all start 
successfully but the Solr cloud cluster does not work - at least when doing distributed searches 
touching replica that used to run on the crashed Solr node, because those replica in not loaded on 
the reestablished node.

How to make sure a reestablished Solr node on a machine with same IP and 
hostname as on the machine that crashed will load all the replica that the old 
Solr node used to run?

Potential solutions
* We have tried to make sure that the solr.xml on the reestablished Solr node is 
containing the same core-list as on the crashed one. Then everything works as we want. 
But this is a little fragile and it is a solution "outside" Solr - you need to 
figure out how to reestablish the solr.xml yourself - probably something like looking 
into clusterstate.json and generate the solr.xml from that
* Untested by us: Maybe we will also succeed just running Core API LOAD operations 
against the new reestablished Solr node - a LOAD operation for each replica that used to 
run on the Solr node. But this is also a little fragile and it is also (partly) a 
solution "outside" Solr - you need to figure out which cores to load yourself.

I have to say that we do not use the "latest" Solr version - we use a version 
of Solr based on 4.0.0. So there might be a solution already in Solr, but I would be 
surprised.

Any thoughts about how this "ought" to be done? Support in Solr? E.g. an 
"operation" to tell a Solr node to load all the replica that used to run on a machine 
with the same IP and hostname? Or...?

Regards, Per Steffensen

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to