Re: Reestablishing a Solr node that ran on a completely crashed machine

Otis Gospodnetic Tue, 18 Jun 2013 05:51:20 -0700

Hi,

Re "ZK becomes the cluster state truth."


I thought that was already the case, no?  Who/what else holds (which)
bits of the total truth?

Thanks,
Otis





On Tue, Jun 18, 2013 at 8:15 AM, Mark Miller <[email protected]> wrote:
> I don't know what the best method to use now is, but the slightly longer term 
> plan is to:
>
> * Have a new mode where you cannot preconfigure cores, only use the 
> collection's API.
> * ZK becomes the cluster state truth.
> * The Overseer takes actions to ensure cores live/die in different places 
> based on the truth in ZK.
>
> - Mark
>
> On Jun 18, 2013, at 6:03 AM, Per Steffensen <[email protected]> wrote:
>
>> Hi
>>
>> Scenario:
>> * 1) You have a Solr cloud cluster running - several Solr nodes across 
>> several machine - many collections with many replica and documents indexed 
>> into them
>> * 2) One of the machines running a Solr node completely crashes - totally 
>> gone including local disk with data/config etc. of the Solr node
>> * 3) You want to be able to insert a new empty machine, install/configure 
>> Solr on this new machine, give it the same IP and hostname as the crashed 
>> machine had, and then we want to be able to start this new Solr node and 
>> have it take the place of the crashed Solr node, making the Solr cloud 
>> cluster work again
>> * 4) No replication (only one replica per shard), so we will accept that the 
>> data on the crashed machine is gone forever, but of course we want the Solr 
>> cloud cluster to continue running with the documents indexed on the other 
>> Solr nodes
>>
>> At my company we are establishing a procedure for what to do in 3) above.
>>
>> Basically we use our "install script" to install/configure the new Solr node 
>> on the new machine as it was originally installed/configured on the crashed 
>> machine back when the system was originally set up - this includes an 
>> "empty" solr.xml file (no cores mentioned). Now starting all the Solr nodes 
>> (including the new reestablished one) again. They all start successfully but 
>> the Solr cloud cluster does not work - at least when doing distributed 
>> searches touching replica that used to run on the crashed Solr node, because 
>> those replica in not loaded on the reestablished node.
>>
>> How to make sure a reestablished Solr node on a machine with same IP and 
>> hostname as on the machine that crashed will load all the replica that the 
>> old Solr node used to run?
>>
>> Potential solutions
>> * We have tried to make sure that the solr.xml on the reestablished Solr 
>> node is containing the same core-list as on the crashed one. Then everything 
>> works as we want. But this is a little fragile and it is a solution 
>> "outside" Solr - you need to figure out how to reestablish the solr.xml 
>> yourself - probably something like looking into clusterstate.json and 
>> generate the solr.xml from that
>> * Untested by us: Maybe we will also succeed just running Core API LOAD 
>> operations against the new reestablished Solr node - a LOAD operation for 
>> each replica that used to run on the Solr node. But this is also a little 
>> fragile and it is also (partly) a solution "outside" Solr - you need to 
>> figure out which cores to load yourself.
>>
>> I have to say that we do not use the "latest" Solr version - we use a 
>> version of Solr based on 4.0.0. So there might be a solution already in 
>> Solr, but I would be surprised.
>>
>> Any thoughts about how this "ought" to be done? Support in Solr? E.g. an 
>> "operation" to tell a Solr node to load all the replica that used to run on 
>> a machine with the same IP and hostname? Or...?
>>
>> Regards, Per Steffensen
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Reestablishing a Solr node that ran on a completely crashed machine

Reply via email to