Repository: kudu Updated Branches: refs/heads/master 9749d4cde -> 287dc5408
KUDU-2335. Work around rare consensus health bug for 1.7 release In very rare circumstances we have hit a DHCECK in quorum_util.cc in pre-commit builds stating that the leader should always have a HEALTHY health status. We have traced this to points in the replica lifecycle when the health status could be UNKNOWN. Since we want to release 1.7.0 soon, let's work around this issue for now. We'll follow up with a "real" fix and a decent test later. Change-Id: Iad67c7943a5b619ef2fa3a67c92cc033e207e197 Reviewed-on: http://gerrit.cloudera.org:8080/9597 Reviewed-by: Alexey Serbin <aser...@cloudera.com> Tested-by: Mike Percy <mpe...@apache.org> Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/287dc540 Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/287dc540 Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/287dc540 Branch: refs/heads/master Commit: 287dc540848561f6fc043e7bb810f44f94d4b419 Parents: 9749d4c Author: Mike Percy <mpe...@apache.org> Authored: Mon Mar 12 18:04:15 2018 -0700 Committer: Mike Percy <mpe...@apache.org> Committed: Tue Mar 13 05:42:45 2018 +0000 ---------------------------------------------------------------------- src/kudu/consensus/quorum_util.cc | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kudu/blob/287dc540/src/kudu/consensus/quorum_util.cc ---------------------------------------------------------------------- diff --git a/src/kudu/consensus/quorum_util.cc b/src/kudu/consensus/quorum_util.cc index 2697911..97c006b 100644 --- a/src/kudu/consensus/quorum_util.cc +++ b/src/kudu/consensus/quorum_util.cc @@ -27,6 +27,7 @@ #include "kudu/common/common.pb.h" #include "kudu/gutil/map-util.h" +#include "kudu/gutil/port.h" #include "kudu/gutil/strings/join.h" #include "kudu/gutil/strings/substitute.h" #include "kudu/util/pb_util.h" @@ -506,9 +507,18 @@ bool ShouldEvictReplica(const RaftConfigPB& config, switch (peer.member_type()) { case RaftPeerPB::VOTER: // A leader should always report itself as being healthy. - DCHECK(peer_uuid != leader_uuid || healthy) << Substitute( - "$0: leader reported as not healthy; config: $1", - peer_uuid, SecureShortDebugString(config)); + if (PREDICT_FALSE(peer_uuid == leader_uuid && !healthy)) { + LOG(WARNING) << Substitute("leader peer $0 reported health as $1; config: $2", + peer_uuid, + HealthReportPB_HealthStatus_Name( + peer.health_report().overall_health()), + SecureShortDebugString(config)); + DCHECK(false) << "Found non-HEALTHY LEADER"; // Crash in DEBUG builds. + // TODO(KUDU-2335): We have seen this assertion in rare circumstances + // in pre-commit builds, so until we fix this lifecycle issue we + // simply do not evict any nodes when the leader is not HEALTHY. + return false; + } ++num_voters_total; if (healthy) {