[jira] [Commented] (SOLR-7087) Temporary shard routing handoff for downed shards and repair

Ted Sullivan (JIRA) Sat, 07 Feb 2015 06:39:54 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-7087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14310748#comment-14310748
 ]


Ted Sullivan commented on SOLR-7087:
------------------------------------

Fair point Mark, we may not call it data loss but the client may. It becomes 
their responsibility to log the doc ids that failed and to send them back when 
the shard is restored. Note that they should do this anyway because I see this 
as a way to mitigate not eliminate data loss. When I use this term now, I will 
qualify it to mean data loss within their entire system architecture. The 
client may not see the distinction, to them data loss is data loss.  I think 
that we should downgrade the impact to Minor but I understand from talking to 
Shalin Mangar about this, that it will probably not be trivial to implement.

As to what happens within a DIH pull. That may require another ticket.


> Temporary shard routing handoff for downed shards and repair
> ------------------------------------------------------------
>
>                 Key: SOLR-7087
>                 URL: https://issues.apache.org/jira/browse/SOLR-7087
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>            Reporter: Ted Sullivan
>            Priority: Minor
>
> Currently if a shard is lost (all of the replicas in a shard go down), there 
> will be data loss as new documents that would be routed to the failed shard 
> are dropped. 
> One potential way to mitigate this would be for Solr to recognize that a 
> shard has gone down (no visible replicas in the cluster state) and 
> temporarily re-route incoming documents to the remaining shards. It would 
> keep a count of 'current active shards' as well as of the number of shards 
> configured. When the number of active shards is less than the number 
> configured, the routing algorithms would use that for shard keys possibly 
> persisting the keys that would have been routed to the offline shard so that 
> when that shard comes back online, the keys could be moved from the shard 
> that hosted them to the one that should have gotten them. This assumes that 
> the downed shards can recover their own Lucene indexes - if these are lost 
> too because of a disk failure then we have to rebuild the index.  How to do 
> that on a per-shard basis could be the subject of another ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-7087) Temporary shard routing handoff for downed shards and repair

Reply via email to