Repository: kudu
Updated Branches:
  refs/heads/master 9749d4cde -> 287dc5408


KUDU-2335. Work around rare consensus health bug for 1.7 release

In very rare circumstances we have hit a DHCECK in quorum_util.cc in
pre-commit builds stating that the leader should always have a HEALTHY
health status. We have traced this to points in the replica lifecycle
when the health status could be UNKNOWN.

Since we want to release 1.7.0 soon, let's work around this issue for
now. We'll follow up with a "real" fix and a decent test later.

Change-Id: Iad67c7943a5b619ef2fa3a67c92cc033e207e197
Reviewed-on: http://gerrit.cloudera.org:8080/9597
Reviewed-by: Alexey Serbin <aser...@cloudera.com>
Tested-by: Mike Percy <mpe...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/287dc540
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/287dc540
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/287dc540

Branch: refs/heads/master
Commit: 287dc540848561f6fc043e7bb810f44f94d4b419
Parents: 9749d4c
Author: Mike Percy <mpe...@apache.org>
Authored: Mon Mar 12 18:04:15 2018 -0700
Committer: Mike Percy <mpe...@apache.org>
Committed: Tue Mar 13 05:42:45 2018 +0000

----------------------------------------------------------------------
 src/kudu/consensus/quorum_util.cc | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/287dc540/src/kudu/consensus/quorum_util.cc
----------------------------------------------------------------------
diff --git a/src/kudu/consensus/quorum_util.cc 
b/src/kudu/consensus/quorum_util.cc
index 2697911..97c006b 100644
--- a/src/kudu/consensus/quorum_util.cc
+++ b/src/kudu/consensus/quorum_util.cc
@@ -27,6 +27,7 @@
 
 #include "kudu/common/common.pb.h"
 #include "kudu/gutil/map-util.h"
+#include "kudu/gutil/port.h"
 #include "kudu/gutil/strings/join.h"
 #include "kudu/gutil/strings/substitute.h"
 #include "kudu/util/pb_util.h"
@@ -506,9 +507,18 @@ bool ShouldEvictReplica(const RaftConfigPB& config,
     switch (peer.member_type()) {
       case RaftPeerPB::VOTER:
         // A leader should always report itself as being healthy.
-        DCHECK(peer_uuid != leader_uuid || healthy) << Substitute(
-            "$0: leader reported as not healthy; config: $1",
-            peer_uuid, SecureShortDebugString(config));
+        if (PREDICT_FALSE(peer_uuid == leader_uuid && !healthy)) {
+          LOG(WARNING) << Substitute("leader peer $0 reported health as $1; 
config: $2",
+                                     peer_uuid,
+                                     HealthReportPB_HealthStatus_Name(
+                                        peer.health_report().overall_health()),
+                                     SecureShortDebugString(config));
+          DCHECK(false) << "Found non-HEALTHY LEADER"; // Crash in DEBUG 
builds.
+          // TODO(KUDU-2335): We have seen this assertion in rare circumstances
+          // in pre-commit builds, so until we fix this lifecycle issue we
+          // simply do not evict any nodes when the leader is not HEALTHY.
+          return false;
+        }
 
         ++num_voters_total;
         if (healthy) {

Reply via email to