Mike Percy has posted comments on this change. Change subject: KUDU-1337. Avoid spurious remote bootstraps on DeleteTablet() ......................................................................
Patch Set 1: (12 comments) http://gerrit.cloudera.org:8080/#/c/2436/1//COMMIT_MSG Commit Message: Line 13: the leader processes the RPC call sending the DeleteTablet() call to it, > Nit: the language here is a little confusing. How about: Done http://gerrit.cloudera.org:8080/#/c/2436/1/src/kudu/integration-tests/remote_bootstrap-itest.cc File src/kudu/integration-tests/remote_bootstrap-itest.cc: Line 743: // a follower tablet is deleted and the leader notices, before it has processed > Nit: I don't think you need the comma after 'notices'. Done Line 745: TEST_F(RemoteBootstrapITest, TestRemoteBootstrappingDeletedTabletFails) { > Neat test, though I was expecting something a little less synthetic: The test sounds OK. We would need to add a remote bootstrap counter metric to implement it though. Also, it seems likely (even before this fix) to pass most of the time and look like just another flaky test. I agree we should have metrics for remote bootstrap, and I'll track that work, but I think it's out of scope for this fix. Line 775: workload.Start(); > Does the test actually need rows to be inserted? Isn't the presence of crea Typically I do this for remote bootstrap tests but in this case, you're right, it's not required. Removing. Line 788: cluster_->tablet_server(1)->uuid(), > For completeness, how about you test a remote bootstrap on both followers? I don't see how that adds any completeness to the test that it doesn't already have. Also, I'm attempting to bootstrap the leader, not the follower. Line 790: 0, // Say I'm from term 0. > Nit: not clear _why_ you're passing 0 for the term value. No good reason, it's basically a junk parameter in this case. Changing it to term 1 to match with the actual term which is more intuitive. Line 799: ASSERT_OK(itest::StartRemoteBootstrap(leader, tablet_id, > Both followers here too. As above Line 804: ASSERT_OK(WaitForServersToAgree(kTimeout, ts_map_, tablet_id, workload.batches_completed() + 1)); > Why is this last call needed? We're waiting for the remote bootstrap to complete (the tablet will start up and the log indexes will match). http://gerrit.cloudera.org:8080/#/c/2436/1/src/kudu/tserver/ts_tablet_manager.cc File src/kudu/tserver/ts_tablet_manager.cc: Line 583: // If the tablet was permanently deleted, then we can never transition it. > think it's worth copying a bit of the explanation in your commit message in Done http://gerrit.cloudera.org:8080/#/c/2436/1/src/kudu/tserver/ts_tablet_manager.h File src/kudu/tserver/ts_tablet_manager.h: Line 319: // transition_in_progress_. > Update this. Done Line 327: // restart). > nit: std::string in header Done Line 328: std::unordered_set<string> perm_deleted_tablet_ids_; > This set grows unbounded, do we care? It only takes up space when a table is permanently deleted and isn't a particularly large amount of memory. I don't think we care. -- To view, visit http://gerrit.cloudera.org:8080/2436 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I852f8ac70e1f6496127598e5e02de5b72711ab2b Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Mike Percy <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Mike Percy <[email protected]> Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
