Adar Dembo has posted comments on this change. Change subject: design-docs: multi-master for 1.0 release ......................................................................
Patch Set 1: (12 comments) http://gerrit.cloudera.org:8080/#/c/2527/1/docs/design-docs/multi-master-1.0.md File docs/design-docs/multi-master-1.0.md: Line 30: ## Gaps in the master > good job with those markdown tags. Yeah, I'm a pro now. Until the next time I use JIRA's "markdown", at which point I'll forget all about this. Line 89: . > this can also cause the cluster to be unbalanced right? maybe mention that Seems unlikely, but I'll mention it. Line 120: f > points 2 and 3 seem even more serious than the title of the jira ticket. wa This issue is perhaps the most complicated of the ones listed here, and I'm trying to shield readers from some of that complexity. Point 2 is actually a non-issue due to the code referenced in KUDU-759. For the sake of simplicity, I'm assuming in this doc that this code has been removed (because it's a pretty bogus workaround for that specific issue). Point 3 is legitimate though. I actually think that a stuck AlterTable() is more serious than reclaiming disk space from deleted tablets. Still think I should change KUDU-1353, though? Line 130: ### > These are the features to be implemented for 1.0 right? maybe mention that Maybe, maybe not. For example, I can see us shipping 1.0 without fixing KUDU-500. Perhaps even without adding support for making master Raft config changes. Line 147: XXX > yeah probably remove Done Line 150: #### > yeah likely file a ticket and leave this out Filed KUDU-1372. Line 165: #### Table, tablet, and tserver metrics > same Filed KUDU-1373. Line 200: 2. All destructive actions taken by a tserver must be "fenced". That is, the > only destructive or all the state changing operations? What does broadening the definition buy us? Are we splitting semantic hairs or is there a real difference? Maybe you could provide an example? Line 201: takes > s/takes/take Done Line 204: current master term > they should keep an opid (i.e. term and index) instead of just term (would I think someone (Mike, perhaps?) suggested that the term would be sufficient and the index was unnecessary, but I've reviewed the various design docs in gdocs and I can't find that suggestion. Can you help me understand why the term is insufficient on its own? Line 206: Ensure that the leader master replicates via Raft before triggering an : action. It doesn't matter what is replicated (a no-op would suffice); : a successful replication asserts that this master is still the leader. > Need to think about this a bit further. I'm a bit worried that this is poin To be fair, I think this is more complicated than option #1 at the moment, but I included it here for completeness and to evoke a discussion. Line 212: partially replicated : operations > are you talking about the ops that need more than one consensus round? didn Sorry for the miscommunication. I'll do some RPC size measurement and update the doc with my conclusions. -- To view, visit http://gerrit.cloudera.org:8080/2527 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Iad76012977a45370b72a04d608371cecf90442ef Gerrit-PatchSet: 1 Gerrit-Project: kudu Gerrit-Branch: master Gerrit-Owner: Adar Dembo <[email protected]> Gerrit-Reviewer: Adar Dembo <[email protected]> Gerrit-Reviewer: David Ribeiro Alves <[email protected]> Gerrit-Reviewer: Kudu Jenkins Gerrit-Reviewer: Todd Lipcon <[email protected]> Gerrit-HasComments: Yes
