[
https://issues.apache.org/jira/browse/HBASE-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882497#comment-13882497
]
Feng Honghua commented on HBASE-1755:
-------------------------------------
I agree with [~lhofhansl] in some sense. ZK is not the root of all evil, it has
its own recommended use pattern:-), it's (very) suitable for scenarios that:
# needs persistent (hierarchical) storage, and this storage is the only holder
for some truth
# the storage size is small
# the access to the storage is sparse
# a plus if have watch/notify mechanism for coding convenience, but the code
using ZK should have inherent idempotence which cares only about the final
state when it's notified (state machine code/logic cares about the total state
transition, so ZK is not good for it)
According to above:
# region location info in META table is not suitable to be in ZK: its size can
be very large
# region assignment status info is not suitable to be in ZK: 1). restart of a
big cluster with big number of regions(say 10K-100K regions) can lead to very
heavy/frequent read/write to ZK during the restart phase; 2). assignment
code/logic is more like a state machine, it expects to have the full knowledge
of the state transition without missed state change(event); 3). assignment
status info duplicate in both master memory and ZK, ZK is not the only truth
holder all the time(actually it's prohibitive to reference ZK as the only truth
for each such info query, currently it serves more for assignment status info
recovering when master fails, seems it's introduced to survive assignment
process in case of master failure, right?)
# replication info is quite suitable to be in ZK, since it matches all of the
above characteristic :-)
Surely, if we embed a consensus lib in master, we actually have an inherent ZK
within master ensemble, that way we can storage all different kinds of
status/info with different access pattern in this 'inherent' ZK within
master(except region location info which is too big to be in memory)
In an ideal world where master never dies, we won't use ZK to store the
status/info currently stored in ZK, right? the master memory is the only truth
holder. But master can die, so we need to duplicate the status/info in both
master and ZK(this can potentially introduce the info-duplication problem, but
the duplicate info problem can be avoided, but at the cost of efficiency: now
we need to always access ZK rather than memory, it's prohibitive for data with
heavy access), no duplication problem if we always use ZK as the truth(actually
we treat ZK as the only truth this way for replication info, the reasons
include replication info data size is small, access is sparse, so we can afford
to always access ZK for replication info, that's why I think ZK is good enough
for replication info:-)).
By embedding zk(consensus lib) within master, the zk and master memory now
combine as one place, no info duplicate, no access efficiency problem, still
have persistence in case of master failure...
> Putting 'Meta' table into ZooKeeper
> -----------------------------------
>
> Key: HBASE-1755
> URL: https://issues.apache.org/jira/browse/HBASE-1755
> Project: HBase
> Issue Type: Improvement
> Affects Versions: 0.90.0
> Reporter: Erik Holstad
>
> Moving to 0.22.0
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)