[
https://issues.apache.org/jira/browse/SOLR-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14007541#comment-14007541
]
ASF subversion and git services commented on SOLR-5468:
-------------------------------------------------------
Commit 1597156 from [~thelabdude] in branch 'dev/trunk'
[ https://svn.apache.org/r1597156 ]
Move SOLR-5468 to new features section.
> Option to enforce a majority quorum approach to accepting updates in SolrCloud
> ------------------------------------------------------------------------------
>
> Key: SOLR-5468
> URL: https://issues.apache.org/jira/browse/SOLR-5468
> Project: Solr
> Issue Type: New Feature
> Components: SolrCloud
> Affects Versions: 4.5
> Environment: All
> Reporter: Timothy Potter
> Assignee: Timothy Potter
> Priority: Minor
> Attachments: SOLR-5468.patch, SOLR-5468.patch, SOLR-5468.patch
>
>
> I've been thinking about how SolrCloud deals with write-availability using
> in-sync replica sets, in which writes will continue to be accepted so long as
> there is at least one healthy node per shard.
> For a little background (and to verify my understanding of the process is
> correct), SolrCloud only considers active/healthy replicas when acknowledging
> a write. Specifically, when a shard leader accepts an update request, it
> forwards the request to all active/healthy replicas and only considers the
> write successful if all active/healthy replicas ack the write. Any down /
> gone replicas are not considered and will sync up with the leader when they
> come back online using peer sync or snapshot replication. For instance, if a
> shard has 3 nodes, A, B, C with A being the current leader, then writes to
> the shard will continue to succeed even if B & C are down.
> The issue is that if a shard leader continues to accept updates even if it
> loses all of its replicas, then we have acknowledged updates on only 1 node.
> If that node, call it A, then fails and one of the previous replicas, call it
> B, comes back online before A does, then any writes that A accepted while the
> other replicas were offline are at risk to being lost.
> SolrCloud does provide a safe-guard mechanism for this problem with the
> leaderVoteWait setting, which puts any replicas that come back online before
> node A into a temporary wait state. If A comes back online within the wait
> period, then all is well as it will become the leader again and no writes
> will be lost. As a side note, sys admins definitely need to be made more
> aware of this situation as when I first encountered it in my cluster, I had
> no idea what it meant.
> My question is whether we want to consider an approach where SolrCloud will
> not accept writes unless there is a majority of replicas available to accept
> the write? For my example, under this approach, we wouldn't accept writes if
> both B&C failed, but would if only C did, leaving A & B online. Admittedly,
> this lowers the write-availability of the system, so may be something that
> should be tunable?
> From Mark M: Yeah, this is kind of like one of many little features that we
> have just not gotten to yet. I’ve always planned for a param that let’s you
> say how many replicas an update must be verified on before responding
> success. Seems to make sense to fail that type of request early if you notice
> there are not enough replicas up to satisfy the param to begin with.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]