Repository: kudu Updated Branches: refs/heads/master 43e7bb1f8 -> ecdeac7d3
Limit number of rowsets in compaction selection to 32 in TSAN mode TSAN limits the number of simultaneous lock acquisitions in a single thread to 64 when using the deadlock detector[1]. However, compaction can select up to 128 (128MB budget / 1MB min rowset size) rowsets in a single op. kudu-tool-test's TestNonRandomWorkloadLoadgen almost always hits TSAN's limit when the KUDU-1400 changes following this patch are applied. This patch prevents this by limiting the number of rowsets selected for a compaction to 32 when running under TSAN. I ran the test with the KUDU-1400 changes on top and saw 97/100 failures. With the change, I saw 100 successes. [1]: https://github.com/google/sanitizers/issues/950 Change-Id: I01ad4ba3a13995c194c3308d72c1eb9b611ef766 Reviewed-on: http://gerrit.cloudera.org:8080/11885 Tested-by: Kudu Jenkins Reviewed-by: Adar Dembo <a...@cloudera.com> Reviewed-by: Andrew Wong <aw...@cloudera.com> Project: http://git-wip-us.apache.org/repos/asf/kudu/repo Commit: http://git-wip-us.apache.org/repos/asf/kudu/commit/ee817a85 Tree: http://git-wip-us.apache.org/repos/asf/kudu/tree/ee817a85 Diff: http://git-wip-us.apache.org/repos/asf/kudu/diff/ee817a85 Branch: refs/heads/master Commit: ee817a85e6844b00226f09c31930d10fdcba91c2 Parents: 43e7bb1 Author: Will Berkeley <wdberke...@gmail.org> Authored: Mon Nov 5 15:54:14 2018 -0800 Committer: Will Berkeley <wdberke...@gmail.com> Committed: Wed Nov 7 18:27:15 2018 +0000 ---------------------------------------------------------------------- src/kudu/tablet/tablet.cc | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/kudu/blob/ee817a85/src/kudu/tablet/tablet.cc ---------------------------------------------------------------------- diff --git a/src/kudu/tablet/tablet.cc b/src/kudu/tablet/tablet.cc index 0d89ad8..603a241 100644 --- a/src/kudu/tablet/tablet.cc +++ b/src/kudu/tablet/tablet.cc @@ -1375,6 +1375,25 @@ Status Tablet::PickRowSetsToCompact(RowSetsInCompaction *picked, continue; } + // For every rowset we pick, we have to take its compact_flush_lock. TSAN + // disallows taking more than 64 locks in a single thread[1], so for large + // compactions this can cause TSAN CHECK failures. To work around, limit the + // number of rowsets picked in TSAN to 32. + // [1]: https://github.com/google/sanitizers/issues/950 + // TODO(wdberkeley): Experiment with a compact_flush lock table instead of + // a per-rowset compact_flush lock. + #if defined(THREAD_SANITIZER) + constexpr auto kMaxPickedUnderTsan = 32; + if (picked->num_rowsets() > kMaxPickedUnderTsan) { + LOG(WARNING) << Substitute("Limiting compaction to $0 rowsets under TSAN", + kMaxPickedUnderTsan); + // Clear 'picked_set' to indicate there's no more rowsets we expect + // to lock. + picked_set.clear(); + break; + } + #endif + // Grab the compact_flush_lock: this prevents any other concurrent // compaction from selecting this same rowset, and also ensures that // we don't select a rowset which is currently in the middle of being