Temporary workaround for KUDU-1959 (race when selecting rowsets) As described in the JIRA, there is a race by which multiple MM threads can race to pick the same rowsets for compaction. Rather than crash when hitting this bug, it is safe to simply abort that compaction attempt. The MM will warn about the compaction failure and try again.
This is a temporary workround for the 1.4 release since the issue was recently reported in the wild on the user list. Change-Id: I9db313849176e1bf05636d969fafb1682e6d78de Reviewed-on: http://gerrit.cloudera.org:8080/7120 Reviewed-by: Adar Dembo <[email protected]> Tested-by: Kudu Jenkins Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/8be2a591 Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/8be2a591 Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/8be2a591 Branch: refs/heads/master Commit: 8be2a59103da46472062f47f89efa6e1bddd0a5c Parents: 693f675 Author: Todd Lipcon <[email protected]> Authored: Thu Jun 8 14:07:52 2017 -0700 Committer: Todd Lipcon <[email protected]> Committed: Thu Jun 8 22:04:19 2017 +0000 ---------------------------------------------------------------------- src/kudu/tablet/tablet.cc | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kudu/blob/8be2a591/src/kudu/tablet/tablet.cc ---------------------------------------------------------------------- diff --git a/src/kudu/tablet/tablet.cc b/src/kudu/tablet/tablet.cc index aaaa72b..fb6043b 100644 --- a/src/kudu/tablet/tablet.cc +++ b/src/kudu/tablet/tablet.cc @@ -1219,7 +1219,13 @@ Status Tablet::PickRowSetsToCompact(RowSetsInCompaction *picked, LOG_WITH_PREFIX(ERROR) << "Rowset selected for compaction but not available anymore: " << not_found->ToString(); } - LOG_WITH_PREFIX(FATAL) << "Was unable to find all rowsets selected for compaction"; + // TODO(todd): this should never happen, but KUDU-1959 is a bug which causes us to + // sometimes concurrently decide to compact the same rowsets. It should be harmless + // to simply abort the compaction when we hit this bug, though long term we should + // fix the underlying race. + const char* msg = "Was unable to find all rowsets selected for compaction"; + LOG_WITH_PREFIX(DFATAL) << msg; + return Status::RuntimeError(msg); } return Status::OK(); }
