[
https://issues.apache.org/jira/browse/HBASE-17963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Elser resolved HBASE-17963.
--------------------------------
Resolution: Incomplete
[[email protected]], I think this is a bit too vague to have any
actionable development efforts attached to it. Discussions about how to fix a
problem are best had on the mailing lists.
You might be interested in trying to tweak the value of
{{hbase.master.balancer.stochastic.localityCost}} to a value like 400 or 500.
This will instruct the balancer to make locality a more dominant factor in
balancing your cluster. This would help a completely crashed cluster to get
back to the "most data locality" state.
> RegionServers lose file locality on unplanned restart
> -----------------------------------------------------
>
> Key: HBASE-17963
> URL: https://issues.apache.org/jira/browse/HBASE-17963
> Project: HBase
> Issue Type: Bug
> Components: hbase
> Affects Versions: 1.1.2
> Environment: Evident with HDP 2.4.3 running HBase 1.1.2
> Reporter: Bjorn Olsen
>
> When an HBase cluster crashes, HFile locality is lost.
> Crashes can happen for a variety of reasons, and in this event having a quick
> time to recover (both data and database performance) is critical.
> On cluster restore, region servers do not load their previous set of regions,
> which means all HFiles must be moved around until locality is achieved again.
> Performance is poor while file locality is not close to 100%.
> A major compaction must be run to move the regions around, which further
> impacts performance and will take longer the more data was in HBase at the
> time of the crash.
> There is a graceful_stop script which is useful for planned outages - you can
> first unload the regions from the region server, restart it, and then reload
> the regions to the same server. No HFiles need to be moved and file locality
> is quickly restored.
> However, with an unplanned outage, there is no locality kept of where the
> regions were. On a crash HBase randomly assigns regions to region servers and
> HFile locality is very low. We then need to move all the HFiles around until
> file locality is restored.
> This is fine for a small number of regions and small HFiles but becomes
> problematic when you have a large number of region servers or large files.
> This JIRA is a request to improve this behavior for unplanned outages by
> trying to restore the regions assigned per server, after a cluster restart.
> For example, HBase could keep a list of the region locality at regular
> intervals, and use this as an initial guideline when regions are restarted.
> Locality might still not be 100% immediately - but presumably better than 0%.
> It would be necessary to first disable the load balancer (if enabled) while
> this restore is happening and enable it afterward.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)