Mike Percy has posted comments on this change.

Change subject: design-docs: multi-master for 1.0 release
......................................................................


Patch Set 4:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/2527/4/docs/design-docs/multi-master-1.0.md
File docs/design-docs/multi-master-1.0.md:

Line 82: 3. Global: map of table name to table instance. Deleted tables are 
omitted,
why not just table name to table id?


Line 84: 4. Global: map of tablet ID to tablet instance. All tablets are 
present,
deleted tables are transient, right?


Line 213: up-to-date before this change is made.
We would also likely want a way to express how stale of a follower master we 
are willing to accept, possibly based on hybrid time.


Line 215: It would be nice to implement this for Kudu 1.0, but it's not a strict
Yeah I don't think this feature would be driven by a pressing need.


Line 222: TODO: JD says the code that detects the current master was partially 
removed
from the Java client, I'm assuming.


Line 278: allow for replicating no-ops (i.e. log entries that needn't be 
persisted).
Actually, we do have this in Raft; heartbeats serve this function.

Some options along these lines to assert leadership:

1. Add a local-only CheckLeadership() call to the Consensus class that returns 
true if: (a) we believe we are the leader and (b) we have heartbeated to a 
majority in the last election timeout.
2. alternatively, CheckLeadership() could be more conservative: trigger an 
immediate heartbeat to a majority if we are, say, half way through our election 
timeout, then block waiting for a majority response. This would give us more 
time to act within our election timeout "lease".

This whole approach to leases is a bit imprecise and due to OS hiccups and 
network delays does not guarantee that a rogue master wouldn't be able to 
command a tserver... it just significantly reduces the window in which that 
could occur.

However it may be safe, given the way we have designed the administrative RPC 
APIs from master -> tserver. I think the requirement for such RPC APIs is that 
they need to tolerate, and reject, command from a master that has stale 
information relative to the Raft config of the tablet being commanded. For 
example, if a tserver gets a ChangeConfig() RPC from a master that specifies an 
old config, the leader tserver will reject the command from the master, because 
the master has fallen out of date. As long as we maintain this invariant across 
all master -> tserver RPCs, I think the master leader lease approach should be 
sufficient for safety.

What I haven't done is go through all the RPCs and ensure that this holds, and 
we should probably do that. If this logic is the basis for the safety design of 
master failover, we should also ensure that invariant makes it into this doc.


Line 417:    1. Adding a new global in-memory unordered_set to "lock" a table's 
name
what does this "lock" actually do? Prevent visibility? We want visibility on 
dashboards if possible, for administrators.


Line 462:    1. Remove step #5
You're saying that on each iteration, we try to select the replicas again and 
only fail on timeout like in step 2? If so, this seems OK to me.


-- 
To view, visit http://gerrit.cloudera.org:8080/2527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iad76012977a45370b72a04d608371cecf90442ef
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to