Repository: kudu Updated Branches: refs/heads/branch-1.4.x eecc64c70 -> 2496aac3b
Temporary workaround for KUDU-1959 (race when selecting rowsets) As described in the JIRA, there is a race by which multiple MM threads can race to pick the same rowsets for compaction. Rather than crash when hitting this bug, it is safe to simply abort that compaction attempt. The MM will warn about the compaction failure and try again. This is a temporary workround for the 1.4 release since the issue was recently reported in the wild on the user list. Change-Id: I9db313849176e1bf05636d969fafb1682e6d78de Reviewed-on: http://gerrit.cloudera.org:8080/7120 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Kudu Jenkins (cherry picked from commit 8be2a59103da46472062f47f89efa6e1bddd0a5c) Reviewed-on: http://gerrit.cloudera.org:8080/7122 Reviewed-by: Todd Lipcon <[email protected]> Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/2496aac3 Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/2496aac3 Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/2496aac3 Branch: refs/heads/branch-1.4.x Commit: 2496aac3bc147e47b4fa91a8b4af34618dd2518e Parents: eecc64c Author: Todd Lipcon <[email protected]> Authored: Thu Jun 8 14:07:52 2017 -0700 Committer: Todd Lipcon <[email protected]> Committed: Thu Jun 8 22:55:48 2017 +0000 ---------------------------------------------------------------------- src/kudu/tablet/tablet.cc | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kudu/blob/2496aac3/src/kudu/tablet/tablet.cc ---------------------------------------------------------------------- diff --git a/src/kudu/tablet/tablet.cc b/src/kudu/tablet/tablet.cc index 5503dc3..67ed325 100644 --- a/src/kudu/tablet/tablet.cc +++ b/src/kudu/tablet/tablet.cc @@ -1219,7 +1219,13 @@ Status Tablet::PickRowSetsToCompact(RowSetsInCompaction *picked, LOG_WITH_PREFIX(ERROR) << "Rowset selected for compaction but not available anymore: " << not_found->ToString(); } - LOG_WITH_PREFIX(FATAL) << "Was unable to find all rowsets selected for compaction"; + // TODO(todd): this should never happen, but KUDU-1959 is a bug which causes us to + // sometimes concurrently decide to compact the same rowsets. It should be harmless + // to simply abort the compaction when we hit this bug, though long term we should + // fix the underlying race. + const char* msg = "Was unable to find all rowsets selected for compaction"; + LOG_WITH_PREFIX(DFATAL) << msg; + return Status::RuntimeError(msg); } return Status::OK(); }
