[
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868802#comment-13868802
]
Feng Honghua commented on HBASE-10296:
--------------------------------------
[~lhofhansl] / [~apurtell] / [[email protected]] / [[email protected]] :
1. paxos / raft / zab library extracted from ZK are all good candidates :-)
2. I agree that implementing a *correct* consensus protocal for production
usage is extremely hard, that's why I tagged this jira's type as brainstorming.
my intention is to raise it to discuss what's a better / more reasonable
architect would look like.
3. If we finally all agree on a better architect/design after
analysis/discussion/proof, we can approach it in an conservative and
incremental way, maybe eventually someday we make it :-)
> Replace ZK with a paxos running within master processes to provide better
> master failover performance and state consistency
> ---------------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-10296
> URL: https://issues.apache.org/jira/browse/HBASE-10296
> Project: HBase
> Issue Type: Brainstorming
> Components: master, Region Assignment, regionserver
> Reporter: Feng Honghua
>
> Currently master relies on ZK to elect active master, monitor liveness and
> store almost all of its states, such as region states, table info,
> replication info and so on. And zk also plays as a channel for
> master-regionserver communication(such as in region assigning) and
> client-regionserver communication(such as replication state/behavior change).
> But zk as a communication channel is fragile due to its one-time watch and
> asynchronous notification mechanism which together can leads to missed
> events(hence missed messages), for example the master must rely on the state
> transition logic's idempotence to maintain the region assigning state
> machine's correctness, actually almost all of the most tricky inconsistency
> issues can trace back their root cause to the fragility of zk as a
> communication channel.
> Replace zk with paxos running within master processes have following benefits:
> 1. better master failover performance: all master, either the active or the
> standby ones, have the same latest states in memory(except lag ones but which
> can eventually catch up later on). whenever the active master dies, the newly
> elected active master can immediately play its role without such failover
> work as building its in-memory states by consulting meta-table and zk.
> 2. better state consistency: master's in-memory states are the only truth
> about the system,which can eliminate inconsistency from the very beginning.
> and though the states are contained by all masters, paxos guarantees they are
> identical at any time.
> 3. more direct and simple communication pattern: client changes state by
> sending requests to master, master and regionserver talk directly to each
> other by sending request and response...all don't bother to using a
> third-party storage like zk which can introduce more uncertainty, worse
> latency and more complexity.
> 4. zk can only be used as liveness monitoring for determining if a
> regionserver is dead, and later on we can eliminate zk totally when we build
> heartbeat between master and regionserver.
> I know this might looks like a very crazy re-architect, but it deserves deep
> thinking and serious discussion for it, right?
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)