Ted Sullivan created SOLR-7087:
----------------------------------
Summary: Temporary shard routing handoff for downed shards and
repair
Key: SOLR-7087
URL: https://issues.apache.org/jira/browse/SOLR-7087
Project: Solr
Issue Type: Improvement
Components: SolrCloud
Reporter: Ted Sullivan
Currently if a shard is lost (all of the replicas in a shard go down), there
will be data loss as new documents that would be routed to the failed shard are
dropped.
One potential way to mitigate this would be for Solr to recognize that a shard
has gone down (no visible replicas in the cluster state) and temporarily
re-route incoming documents to the remaining shards. It would keep a count of
'current active shards' as well as of the number of shards configured. When the
number of active shards is less than the number configured, the routing
algorithms would use that for shard keys possibly persisting the keys that
would have been routed to the offline shard so that when that shard comes back
online, the keys could be moved from the shard that hosted them to the one that
should have gotten them. This assumes that the downed shards can recover their
own Lucene indexes - if these are lost too because of a disk failure then we
have to rebuild the index. How to do that on a per-shard basis could be the
subject of another ticket.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]