[ https://issues.apache.org/jira/browse/HBASE-8803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jean-Marc Spaggiari updated HBASE-8803: --------------------------------------- Status: Patch Available (was: Open) So here is what I did. First, when regions are unloaded from the server to be moved to other servers. Instead of doing that region by region and randomly, it's now doing that in round dobin mode, assigning one region per RS. So if there is 20 RS in the cluster, one beeing unloaded, it will move the regions 19 by 19! Then to restore the regions, instead of doing that one by one, it's now going that 10 by 10. As a result, the rolling-restart now takes 16 minutes in my cluster instead of 74 minutes. And the bigger the cluster is, the faster it will be. This version is for review only. Open to comments. I have tested it on 0.94, but I don't have a cluster running with Trunk, so I'm not able to test it... > region_mover.rb should move multiple regions at a time > ------------------------------------------------------ > > Key: HBASE-8803 > URL: https://issues.apache.org/jira/browse/HBASE-8803 > Project: HBase > Issue Type: Bug > Affects Versions: 0.95.1, 0.94.8, 0.98.0 > Reporter: Jean-Marc Spaggiari > Assignee: Jean-Marc Spaggiari > Attachments: HBASE-8803-v0-trunk.patch > > Original Estimate: 48h > Remaining Estimate: 48h > > When there is many regions in a cluster, rolling_restart can take hours > because region_mover is moving the regions one by one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira