Re: Reestablishing a Solr node that ran on a completely crashed machine

Otis Gospodnetic Tue, 18 Jun 2013 07:42:30 -0700

I see.  Thanks for the explanation.  Brrrr, yeah, ZK should be the one
and only brain there, I think.  And forget Fiat, go for Mercedes.


Otis



On Tue, Jun 18, 2013 at 10:24 AM, Mark Miller <[email protected]> wrote:
> With preconfigurable cores, each node with cores also holds some truth.
>
> You might have a core registered in zk but it doesn't exist on a node. You 
> might have a core that is not registered in zk, but does on a node. A core 
> that comes up might be a really old node coming back or it might be a user 
> that pre configured a new core.
>
> Without preconfigurable cores, the Overseer can adjust for these things and 
> make ZK the truth by fiat.
>
> - Mark
>
> On Jun 18, 2013, at 8:50 AM, Otis Gospodnetic <[email protected]> 
> wrote:
>
>> Hi,
>>
>> Re "ZK becomes the cluster state truth."
>>
>> I thought that was already the case, no?  Who/what else holds (which)
>> bits of the total truth?
>>
>> Thanks,
>> Otis
>>
>>
>>
>>
>>
>> On Tue, Jun 18, 2013 at 8:15 AM, Mark Miller <[email protected]> wrote:
>>> I don't know what the best method to use now is, but the slightly longer 
>>> term plan is to:
>>>
>>> * Have a new mode where you cannot preconfigure cores, only use the 
>>> collection's API.
>>> * ZK becomes the cluster state truth.
>>> * The Overseer takes actions to ensure cores live/die in different places 
>>> based on the truth in ZK.
>>>
>>> - Mark
>>>
>>> On Jun 18, 2013, at 6:03 AM, Per Steffensen <[email protected]> wrote:
>>>
>>>> Hi
>>>>
>>>> Scenario:
>>>> * 1) You have a Solr cloud cluster running - several Solr nodes across 
>>>> several machine - many collections with many replica and documents indexed 
>>>> into them
>>>> * 2) One of the machines running a Solr node completely crashes - totally 
>>>> gone including local disk with data/config etc. of the Solr node
>>>> * 3) You want to be able to insert a new empty machine, install/configure 
>>>> Solr on this new machine, give it the same IP and hostname as the crashed 
>>>> machine had, and then we want to be able to start this new Solr node and 
>>>> have it take the place of the crashed Solr node, making the Solr cloud 
>>>> cluster work again
>>>> * 4) No replication (only one replica per shard), so we will accept that 
>>>> the data on the crashed machine is gone forever, but of course we want the 
>>>> Solr cloud cluster to continue running with the documents indexed on the 
>>>> other Solr nodes
>>>>
>>>> At my company we are establishing a procedure for what to do in 3) above.
>>>>
>>>> Basically we use our "install script" to install/configure the new Solr 
>>>> node on the new machine as it was originally installed/configured on the 
>>>> crashed machine back when the system was originally set up - this includes 
>>>> an "empty" solr.xml file (no cores mentioned). Now starting all the Solr 
>>>> nodes (including the new reestablished one) again. They all start 
>>>> successfully but the Solr cloud cluster does not work - at least when 
>>>> doing distributed searches touching replica that used to run on the 
>>>> crashed Solr node, because those replica in not loaded on the 
>>>> reestablished node.
>>>>
>>>> How to make sure a reestablished Solr node on a machine with same IP and 
>>>> hostname as on the machine that crashed will load all the replica that the 
>>>> old Solr node used to run?
>>>>
>>>> Potential solutions
>>>> * We have tried to make sure that the solr.xml on the reestablished Solr 
>>>> node is containing the same core-list as on the crashed one. Then 
>>>> everything works as we want. But this is a little fragile and it is a 
>>>> solution "outside" Solr - you need to figure out how to reestablish the 
>>>> solr.xml yourself - probably something like looking into clusterstate.json 
>>>> and generate the solr.xml from that
>>>> * Untested by us: Maybe we will also succeed just running Core API LOAD 
>>>> operations against the new reestablished Solr node - a LOAD operation for 
>>>> each replica that used to run on the Solr node. But this is also a little 
>>>> fragile and it is also (partly) a solution "outside" Solr - you need to 
>>>> figure out which cores to load yourself.
>>>>
>>>> I have to say that we do not use the "latest" Solr version - we use a 
>>>> version of Solr based on 4.0.0. So there might be a solution already in 
>>>> Solr, but I would be surprised.
>>>>
>>>> Any thoughts about how this "ought" to be done? Support in Solr? E.g. an 
>>>> "operation" to tell a Solr node to load all the replica that used to run 
>>>> on a machine with the same IP and hostname? Or...?
>>>>
>>>> Regards, Per Steffensen
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Reestablishing a Solr node that ran on a completely crashed machine

Reply via email to