Ted Sullivan created SOLR-7087:
----------------------------------

             Summary: Temporary shard routing handoff for downed shards and 
repair
                 Key: SOLR-7087
                 URL: https://issues.apache.org/jira/browse/SOLR-7087
             Project: Solr
          Issue Type: Improvement
          Components: SolrCloud
            Reporter: Ted Sullivan


Currently if a shard is lost (all of the replicas in a shard go down), there 
will be data loss as new documents that would be routed to the failed shard are 
dropped. 

One potential way to mitigate this would be for Solr to recognize that a shard 
has gone down (no visible replicas in the cluster state) and temporarily 
re-route incoming documents to the remaining shards. It would keep a count of 
'current active shards' as well as of the number of shards configured. When the 
number of active shards is less than the number configured, the routing 
algorithms would use that for shard keys possibly persisting the keys that 
would have been routed to the offline shard so that when that shard comes back 
online, the keys could be moved from the shard that hosted them to the one that 
should have gotten them. This assumes that the downed shards can recover their 
own Lucene indexes - if these are lost too because of a disk failure then we 
have to rebuild the index.  How to do that on a per-shard basis could be the 
subject of another ticket.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to