[ 
https://issues.apache.org/jira/browse/HBASE-1755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882497#comment-13882497
 ] 

Feng Honghua commented on HBASE-1755:
-------------------------------------

I agree with [~lhofhansl] in some sense. ZK is not the root of all evil, it has 
its own recommended use pattern:-), it's (very) suitable for scenarios that:
# needs persistent (hierarchical) storage, and this storage is the only holder 
for some truth
# the storage size is small
# the access to the storage is sparse
# a plus if have watch/notify mechanism for coding convenience, but the code 
using ZK should have inherent idempotence which cares only about the final 
state when it's notified (state machine code/logic cares about the total state 
transition, so ZK is not good for it)

According to above:
# region location info in META table is not suitable to be in ZK: its size can 
be very large
# region assignment status info is not suitable to be in ZK: 1). restart of a 
big cluster with big number of regions(say 10K-100K regions) can lead to very 
heavy/frequent read/write to ZK during the restart phase; 2). assignment 
code/logic is more like a state machine, it expects to have the full knowledge 
of the state transition without missed state change(event); 3). assignment 
status info duplicate in both master memory and ZK, ZK is not the only truth 
holder all the time(actually it's prohibitive to reference ZK as the only truth 
for each such info query, currently it serves more for assignment status info 
recovering when master fails, seems it's introduced to survive assignment 
process in case of master failure, right?)
# replication info is quite suitable to be in ZK, since it matches all of the 
above characteristic :-)

Surely, if we embed a consensus lib in master, we actually have an inherent ZK 
within master ensemble, that way we can storage all different kinds of 
status/info with different access pattern in this 'inherent' ZK within 
master(except region location info which is too big to be in memory)

In an ideal world where master never dies, we won't use ZK to store the 
status/info currently stored in ZK, right? the master memory is the only truth 
holder. But master can die, so we need to duplicate the status/info in both 
master and ZK(this can potentially introduce the info-duplication problem, but 
the duplicate info problem can be avoided, but at the cost of efficiency: now 
we need to always access ZK rather than memory, it's prohibitive for data with 
heavy access), no duplication problem if we always use ZK as the truth(actually 
we treat ZK as the only truth this way for replication info, the reasons 
include replication info data size is small, access is sparse, so we can afford 
to always access ZK for replication info, that's why I think ZK is good enough 
for replication info:-)). 
By embedding zk(consensus lib) within master, the zk and master memory now 
combine as one place, no info duplicate, no access efficiency problem, still 
have persistence in case of master failure...

> Putting 'Meta' table into ZooKeeper
> -----------------------------------
>
>                 Key: HBASE-1755
>                 URL: https://issues.apache.org/jira/browse/HBASE-1755
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.90.0
>            Reporter: Erik Holstad
>
> Moving to 0.22.0



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to