[
https://issues.apache.org/jira/browse/HBASE-1583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12723883#action_12723883
]
stack commented on HBASE-1583:
------------------------------
Safe mode is still there. Thats just period during which all machines report
in and during which we hand out catalog regions. After safe mode elapses, then
the mayhem breaks out as master tries to hand out 6k regions ten or so at a
time balancing at same time.
Region assignment needs to part of larger scale rewrite of master function.
Hows does a master figure a region unassigned? It reads .META. table to figure
current state. We need to be careful how we bridge scan of .META. and read of
zk.
> Start/Stop of large cluster untenable
> -------------------------------------
>
> Key: HBASE-1583
> URL: https://issues.apache.org/jira/browse/HBASE-1583
> Project: Hadoop HBase
> Issue Type: Bug
> Reporter: stack
> Fix For: 0.20.0
>
>
> Starting and stopping a loaded large cluster is way too flakey and takes too
> long. This is 0.19.x but same issues apply to TRUNK I'd say.
> At pset with our > 100 nodes carrying 6k regions:
> + shutdown takes way too long.... maybe ten minutes or so. We compact
> regions inline with shutdown. We should just go down. It doesn't seem like
> all regionservers go down everytime either.
> + startup is a mess with our assigning out regions an rebalancing at same
> time. By time that the compactions on open run, it can be near an hour
> before whole thing settles down and becomes useable
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.