[
https://issues.apache.org/jira/browse/KUDU-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016508#comment-17016508
]
ASF subversion and git services commented on KUDU-3011:
-------------------------------------------------------
Commit 31ed4a11de3f5a158d90d76b306c2d96fe07a55d in kudu's branch
refs/heads/master from Andrew Wong
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=31ed4a1 ]
KUDU-3011 p6: don't transfer leadership to quiescing followers
When a tablet server is quiescing, any followers hosted on it should not
be considered candidates to be the leader's successor.
A quiescing follower already would never become a leader because it
would reject the StartElection request immediately. This patch improves
upon this by nipping such requests in the bud. Followers will now send
along with their ConsensusResponses whether or not they're quiescing,
and if they are, the leader will know not to transfer leadership to it.
I considered another approach to reducing the number of fruitless RPCs
sent -- simply throttling the interval with which a leader can send
StartElection requests. I opted to go with the current approach since it
is more complete with regards to preventing extraneous StartElection
requests.
Change-Id: I74ec79d0bc4dbe42fce0cca2e001cd0b369cd066
Reviewed-on: http://gerrit.cloudera.org:8080/15035
Reviewed-by: Adar Dembo <[email protected]>
Reviewed-by: Alexey Serbin <[email protected]>
Tested-by: Kudu Jenkins
> Support for smooth maintenance window
> -------------------------------------
>
> Key: KUDU-3011
> URL: https://issues.apache.org/jira/browse/KUDU-3011
> Project: Kudu
> Issue Type: New Feature
> Reporter: LiFu He
> Assignee: Andrew Wong
> Priority: Major
>
> A scan corresponding to a tablet failure causes the entire SQL to fail on the
> common query engines, such as Impala. Though we have the fault-tolerant
> feature by "SetFaultTolerant()", Impala doesn't use it right now since that
> will make lower throughput. Thus, lots of SQL that are running will fail when
> we shutdown/reboot/upgrade the tserver. That can be scary.
> Maybe we can do some improvement in this area, for example, the tablets are
> not allowed to be scanned after the tserver is in maintenance mode
> (KUDU-2069). And for the LEADER_ONLY mode scanning, the leader role needs to
> be shifted from the maintenance tserver. Then we can shutdown the tserver
> smoothly after all the existing SQL are completed.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)