[
https://issues.apache.org/jira/browse/HBASE-3019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack updated HBASE-3019:
-------------------------
Attachment: bulk-v7.txt
So, I'm giving up on this tactic for now of trying to assign in bulk. Its
slower than whats in place currently mostly because we bulk set state in zk
first, before we proceed to send bulk region open to the regionserver. The
bulk setting of state in zk takes time and in parts needs to be done under a
synchronization block so regionsInTransition can be updated atomically. In
effect we proceed serially through servers. Also, theres a problem
transitioning states. I've put a note in the patch. Before moving region
state to PENDING_OPEN, we need to wait on the zk callback that confirms setting
state to OFFLINE. Without this it the PENDING_OPEN can be set before OFFLINE
has finished and we'll get ourselves into an unwanted state. To go further
with this patch, would need to change our zking to be async.
Though giving up on this bulk assign, will reuse the most of this patch in a
new issue, hbase-3055, as it improves general bulk assign.
> Make bulk assignment on cluster startup run faster
> --------------------------------------------------
>
> Key: HBASE-3019
> URL: https://issues.apache.org/jira/browse/HBASE-3019
> Project: HBase
> Issue Type: Improvement
> Reporter: stack
> Attachments: bulk-v4.txt, bulk-v7.txt
>
>
> Currently, as of HBASE-3018, we come up with a bulk assignment plan that is
> sorted by server. We then spawn a thread to assign out the regions per
> server so we are assigning in parallel. This works but is still slow enough
> (It looks to be slower than the old assignment where we'd do lumps of N
> regions at a time). We should be able to pass a regionserver all the regions
> to open in one RPC. We need to figure how to keep up zk state while
> regionserver is processing a big lot of regions. This looks a little awkward
> to do since currently open handler just opens region -- there is no notion of
> doing a ping while waiting to run.
> Being able to start the cluster fast is important for those times we take it
> down to do major upgrade; the longer it takes to spin up, the longer our
> 'downtime'.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.