[
https://issues.apache.org/jira/browse/KUDU-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17009193#comment-17009193
]
ASF subversion and git services commented on KUDU-3011:
-------------------------------------------------------
Commit 4f22c0f9a6e8d41ec5efa0bba25aed55936a9f91 in kudu's branch
refs/heads/master from Andrew Wong
[ https://gitbox.apache.org/repos/asf?p=kudu.git;h=4f22c0f ]
KUDU-3011 p1: metric to count tablet leaders
Adds a metric that tracks the number of tablet leaders on a given
tablet server. This is done by plumbing down a top-level gauge down from
the KuduServer to each TabletReplica's RaftConsensus instance.
This is done at the KuduServer to eventually be extensible to count the
number of master system catalog leaders, though that plumbing is left
for a future patch. I've left a TODO where I expect this to happen.
I also considered instead having the metric be defined by a functor that
would iterate through all replicas and check each's leadership status. I
opted to not do this, since iterating through and locking each
RaftConsensus instance seemed like it'd be less performant.
This will be useful in orchestrating a smooth maintenance window, as it
will allow us to determine whether leadership has quiesced away from a
given tablet server.
Change-Id: Iaa6554458a860e34f97af168da7ed786c8ef47e4
Reviewed-on: http://gerrit.cloudera.org:8080/14976
Tested-by: Kudu Jenkins
Reviewed-by: Adar Dembo <[email protected]>
> Support for smooth maintenance window
> -------------------------------------
>
> Key: KUDU-3011
> URL: https://issues.apache.org/jira/browse/KUDU-3011
> Project: Kudu
> Issue Type: New Feature
> Reporter: LiFu He
> Assignee: Andrew Wong
> Priority: Major
>
> A scan corresponding to a tablet failure causes the entire SQL to fail on the
> common query engines, such as Impala. Though we have the fault-tolerant
> feature by "SetFaultTolerant()", Impala doesn't use it right now since that
> will make lower throughput. Thus, lots of SQL that are running will fail when
> we shutdown/reboot/upgrade the tserver. That can be scary.
> Maybe we can do some improvement in this area, for example, the tablets are
> not allowed to be scanned after the tserver is in maintenance mode
> (KUDU-2069). And for the LEADER_ONLY mode scanning, the leader role needs to
> be shifted from the maintenance tserver. Then we can shutdown the tserver
> smoothly after all the existing SQL are completed.
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)