Todd Lipcon has posted comments on this change.

Change subject: design-docs: multi-master for 1.0 release
......................................................................


Patch Set 4:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/2527/4/docs/design-docs/multi-master-1.0.md
File docs/design-docs/multi-master-1.0.md:

Line 74: 4. Replica instance. Includes a link to the tserver, the state of the 
replica
it's true there is a TabletReplica map, but this is a struct that is never 
pointed-to (it's by-value in the map). I would probably just describe this as 
part of the information of the Tablet instance. eg ("general metadata about the 
tablet, along with its current set of replicas")


Line 215: It would be nice to implement this for Kudu 1.0, but it's not a strict
> Yeah I don't think this feature would be driven by a pressing need.
agreed, we haven't seen scalability/perf issues on the current design, so 
should be out of scope


Line 246: #### Follower masters must treat heartbeats as no-ops
seems contradictory - you aren't treating it quite as a no-op -- you're 
treating it only as a liveness indicator, right?


Line 275: master replicates via Raft before triggering a state change. It 
doesn't
I think there's another alternative: on startup, the tserver sends a heartbeat 
to all masters (or the "register" RPC or whatever). It waits until it has 
received a response from a majority of them. The masters respond including the 
term of the last committed operation (or maybe just the current term). The 
tservers can initialize their current term based on the majority response here.

In other words, we already know that by the guarantees of raft, if there is a 
new leader elected, and we talk to a majority, at least one of those majority 
would know about the newest leader. This avoids having to do any persistent 
state on the TS side.


Line 278: allow for replicating no-ops (i.e. log entries that needn't be 
persisted).
> Actually, we do have this in Raft; heartbeats serve this function.
I think leasing on its own would be enough to be safe. In the example in the 
doc:

- Master 1 (M1) is leader.
- M1 loses leadership to M2, but thinks it is still leader

- M2 sends a 'create tablet' command to TS
-- before doing so, M2 must be fully a leader (meaning that it has waited at 
least a lease period)
- TS restarts
- TS contacts M1
-- M1 checks its lease _before_ sending a 'Delete' call. Since we already 
assured that M2 waited a lease period, then we know that M1 has also lost its 
lease, and won't be able to send Delete.

Lease algorithms like this are tolerant of clocks being unsynchronized and 
message delays, just so long as the clock _rates_ are within some percentage 
tolerance.


Line 327: To fix KUDU-1353, the per-tablet replica locations should be rebuilt 
on
Maybe we should do away with this information and just use the RaftConfigPB for 
the tablet, and assume that the tservers have what we think they have? I'm not 
sure the TabletReplica map buys us much, and it gets confusing about when to 
use that vs just trusting the config


Line 417:    1. Adding a new global in-memory unordered_set to "lock" a table's 
name
> what does this "lock" actually do? Prevent visibility? We want visibility o
yea, I tend to agree that for many operations, an in-progress state is useful 
to expose


Line 420:       LockManager for this, as it has no real tablet dependencies
per some conversation we had last week, I would rather avoid adding more locks 
unless it really buys us a lot. (where "a lot" != avoiding some super 
rarely-experienced semantic which doesnt crash but is just strange)


Line 423: #### Background task scanning
should we evaluate getting rid of the "backgrund tasks" thread and instead 
treat it more like the way we do alter/delete, with a "retrying task" that just 
runs and retries until successful? and "kick off" these tasks with a single 
scan when we become active?


-- 
To view, visit http://gerrit.cloudera.org:8080/2527
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Iad76012977a45370b72a04d608371cecf90442ef
Gerrit-PatchSet: 4
Gerrit-Project: kudu
Gerrit-Branch: master
Gerrit-Owner: Adar Dembo <[email protected]>
Gerrit-Reviewer: Adar Dembo <[email protected]>
Gerrit-Reviewer: Dan Burkert <[email protected]>
Gerrit-Reviewer: David Ribeiro Alves <[email protected]>
Gerrit-Reviewer: Jean-Daniel Cryans
Gerrit-Reviewer: Kudu Jenkins
Gerrit-Reviewer: Mike Percy <[email protected]>
Gerrit-Reviewer: Todd Lipcon <[email protected]>
Gerrit-HasComments: Yes

Reply via email to