Repository: kudu
Updated Branches:
  refs/heads/branch-1.4.x eecc64c70 -> 2496aac3b


Temporary workaround for KUDU-1959 (race when selecting rowsets)

As described in the JIRA, there is a race by which multiple MM threads
can race to pick the same rowsets for compaction. Rather than crash when
hitting this bug, it is safe to simply abort that compaction attempt.
The MM will warn about the compaction failure and try again.

This is a temporary workround for the 1.4 release since the issue was
recently reported in the wild on the user list.

Change-Id: I9db313849176e1bf05636d969fafb1682e6d78de
Reviewed-on: http://gerrit.cloudera.org:8080/7120
Reviewed-by: Adar Dembo <[email protected]>
Tested-by: Kudu Jenkins
(cherry picked from commit 8be2a59103da46472062f47f89efa6e1bddd0a5c)
Reviewed-on: http://gerrit.cloudera.org:8080/7122
Reviewed-by: Todd Lipcon <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/kudu/repo
Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/2496aac3
Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/2496aac3
Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/2496aac3

Branch: refs/heads/branch-1.4.x
Commit: 2496aac3bc147e47b4fa91a8b4af34618dd2518e
Parents: eecc64c
Author: Todd Lipcon <[email protected]>
Authored: Thu Jun 8 14:07:52 2017 -0700
Committer: Todd Lipcon <[email protected]>
Committed: Thu Jun 8 22:55:48 2017 +0000

----------------------------------------------------------------------
 src/kudu/tablet/tablet.cc | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/kudu/blob/2496aac3/src/kudu/tablet/tablet.cc
----------------------------------------------------------------------
diff --git a/src/kudu/tablet/tablet.cc b/src/kudu/tablet/tablet.cc
index 5503dc3..67ed325 100644
--- a/src/kudu/tablet/tablet.cc
+++ b/src/kudu/tablet/tablet.cc
@@ -1219,7 +1219,13 @@ Status Tablet::PickRowSetsToCompact(RowSetsInCompaction 
*picked,
       LOG_WITH_PREFIX(ERROR) << "Rowset selected for compaction but not 
available anymore: "
                              << not_found->ToString();
     }
-    LOG_WITH_PREFIX(FATAL) << "Was unable to find all rowsets selected for 
compaction";
+    // TODO(todd): this should never happen, but KUDU-1959 is a bug which 
causes us to
+    // sometimes concurrently decide to compact the same rowsets. It should be 
harmless
+    // to simply abort the compaction when we hit this bug, though long term 
we should
+    // fix the underlying race.
+    const char* msg = "Was unable to find all rowsets selected for compaction";
+    LOG_WITH_PREFIX(DFATAL) << msg;
+    return Status::RuntimeError(msg);
   }
   return Status::OK();
 }

Reply via email to