[ 
https://issues.apache.org/jira/browse/KUDU-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17016507#comment-17016507
 ] 

ASF subversion and git services commented on KUDU-3011:
-------------------------------------------------------

Commit 54db215511e84785a8649ba1e52911f8adfb11e4 in kudu's branch 
refs/heads/master from Andrew Wong
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=54db215 ]

KUDU-3011 p5: transfer leadership when quiescing

This amends the behavior of quiescing such that when a tablet server is
quiescing, it will transfer leadership to a caught-up follower as soon
as it can.

While in this state, unlike while in a graceful stepdown period, the
tablet can still be written to, as to not obstruct on-going workloads.

Tests are added to exercise:
- The basic behavior: even without injecting any errors that might cause
  elections, a quiescing leader will relinquish leadership.
- The behavior when there are followers being caught up. In such cases,
  the leader won't immediately relinquish leadership -- instead, it will
  wait for the followers to catch up before stepping down.
- The behavior when being written to. The fact that a leader is
  quiescing shouldn't affect its ability to be written to.
- The behavior of the PeerMessageQueue when responding to various peer
  responses.

I also removed some election-causing injection in a couple existing
tests that was previously required to transfer leadership while
quiescing.

Note: right now, if all tablet servers are quiescing while there is a
write workload on-going, a large number of StartElection requests will
be sent from the leaders to the followers. A follow-up patch will
address this.

Change-Id: Idbf0716f5c9455f83ff5f6f601b0f5042f77d078
Reviewed-on: http://gerrit.cloudera.org:8080/15012
Reviewed-by: Adar Dembo <[email protected]>
Reviewed-by: Alexey Serbin <[email protected]>
Tested-by: Andrew Wong <[email protected]>


> Support for smooth maintenance window
> -------------------------------------
>
>                 Key: KUDU-3011
>                 URL: https://issues.apache.org/jira/browse/KUDU-3011
>             Project: Kudu
>          Issue Type: New Feature
>            Reporter: LiFu He
>            Assignee: Andrew Wong
>            Priority: Major
>
> A scan corresponding to a tablet failure causes the entire SQL to fail on the 
> common query engines, such as Impala. Though we have the fault-tolerant 
> feature by "SetFaultTolerant()", Impala doesn't use it right now since that 
> will make lower throughput. Thus, lots of SQL that are running will fail when 
> we shutdown/reboot/upgrade the tserver. That can be scary.
> Maybe we can do some improvement in this area, for example, the tablets are 
> not allowed to be scanned after the tserver is in maintenance mode 
> (KUDU-2069). And for the LEADER_ONLY mode scanning, the leader role needs to 
> be shifted from the maintenance tserver. Then we can shutdown the tserver 
> smoothly after all the existing SQL are completed.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to