[
https://issues.apache.org/jira/browse/HBASE-14129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15142089#comment-15142089
]
Clara Xiong commented on HBASE-14129:
-------------------------------------
The proposed fix didn't work and I went with a different solution.
HBASE-15251
> If any regionserver gets shutdown uncleanly during full cluster restart,
> locality looks to be lost
> --------------------------------------------------------------------------------------------------
>
> Key: HBASE-14129
> URL: https://issues.apache.org/jira/browse/HBASE-14129
> Project: HBase
> Issue Type: Bug
> Reporter: churro morales
> Fix For: 2.0.0, 1.3.0
>
> Attachments: HBASE-14129.patch
>
>
> We were doing a cluster restart the other day. Some regionservers did not
> shut down cleanly. Upon restart our locality went from 99% to 5%. Upon
> looking at the AssignmentManager.joinCluster() code it calls
> AssignmentManager.processDeadServersAndRegionsInTransition().
> If the failover flag gets set for any reason it seems we don't call
> assignAllUserRegions(). Then it looks like the balancer does the work in
> assigning those regions, we don't use a locality aware balancer and we lost
> our region locality.
> I don't have a solid grasp on the reasoning for these checks but there could
> be some potential workarounds here.
> 1. After shutting down your cluster, move your WALs aside (replay later).
> 2. Clean up your zNodes
> That seems to work, but requires a lot of manual labor. Another solution
> which I prefer would be to have a flag for ./start-hbase.sh --clean
> If we start master with that flag then we do a check in
> AssignmentManager.processDeadServersAndRegionsInTransition() thus if this
> flag is set we call: assignAllUserRegions() regardless of the failover state.
> I have a patch for the later solution, that is if I am understanding the
> logic correctly.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)