[ https://issues.apache.org/jira/browse/HBASE-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397290#comment-13397290 ]
Laxman commented on HBASE-4246: ------------------------------- This may come in latest version also as we didn't change the znode hierarchy of the unassigned regions. As mentioned in linked issue, there is a cap on packet length. We can't read/write huge data in a single packet. IMO, to resolve this we need to do *either of the following*. * In HBASE: We can use hierarchical structure. HDFS datanode follows similar strategy. It keeps block files in different sub directories to avoid FS lookup latency. * In ZooKeeper: Increase the limit. What is reasonable? We have tried this out in some other project but it has the side effects. When we tried read/write huge data from ZooKeeper, clients occassionally gets disconnected. This is sequential request processing. Please check out the related discussions @ http://mail-archives.apache.org/mod_mbox/zookeeper-user/201007.mbox/%3cc85a33ec.3a46a%25maha...@yahoo-inc.com%3E Following JIRA and discussion also applicable in current scenario. http://mail-archives.apache.org/mod_mbox/zookeeper-user/201104.mbox/%3cffa3bdb6-1c83-42b9-b2a0-767513462...@me.com%3E https://issues.apache.org/jira/browse/ZOOKEEPER-1049 > Cluster with too many regions cannot withstand some master failover scenarios > ----------------------------------------------------------------------------- > > Key: HBASE-4246 > URL: https://issues.apache.org/jira/browse/HBASE-4246 > Project: HBase > Issue Type: Bug > Components: master, zookeeper > Affects Versions: 0.90.4 > Reporter: Todd Lipcon > Priority: Critical > Fix For: 0.96.0 > > > We ran into the following sequence of events: > - master startup failed after only ROOT had been assigned (for another reason) > - restarted the master without restarting other servers. Since there was at > least one region assigned, it went through the failover code path > - master scanned META and inserted every region into /hbase/unassigned in ZK. > - then, it called "listChildren" on the /hbase/unassigned znode, and crashed > with "Packet len6080218 is out of range!" since the IPC response was larger > than the default maximum. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira