[
https://issues.apache.org/jira/browse/HBASE-4246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089870#comment-13089870
]
Todd Lipcon commented on HBASE-4246:
------------------------------------
We were able to work around this issue by bumping jute.maxbuffer up to 100MB on
the cluster in question.
Another solution would be to shard the /hbase/unassigned dir by a prefix of the
region ID. eg region 1234567890abcdef would go in
/hbase/unassigned/1234/1234567890abcdef - so, we have to do a traversal to get
the full list, but any particular RPC response is limited in size.
> Cluster with too many regions cannot withstand some master failover scenarios
> -----------------------------------------------------------------------------
>
> Key: HBASE-4246
> URL: https://issues.apache.org/jira/browse/HBASE-4246
> Project: HBase
> Issue Type: Bug
> Components: master, zookeeper
> Affects Versions: 0.90.4
> Reporter: Todd Lipcon
> Priority: Critical
> Fix For: 0.94.0
>
>
> We ran into the following sequence of events:
> - master startup failed after only ROOT had been assigned (for another reason)
> - restarted the master without restarting other servers. Since there was at
> least one region assigned, it went through the failover code path
> - master scanned META and inserted every region into /hbase/unassigned in ZK.
> - then, it called "listChildren" on the /hbase/unassigned znode, and crashed
> with "Packet len6080218 is out of range!" since the IPC response was larger
> than the default maximum.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira