[
https://issues.apache.org/jira/browse/HBASE-21102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
ramkrishna.s.vasudevan updated HBASE-21102:
-------------------------------------------
Status: Patch Available (was: Open)
Though the patch _1 was solving the real problem at times the test case was
failing. After spending some time in debugging found that, though we do
retainAssignment at the balancer level, since this is a ServerCrash case, the
regions in that crashed server are randomly assigned to the existing servers.
So
{code}
do {
int i = RANDOM.nextInt(numServers);
sn = servers.get(i);
} while (cluster.wouldLowerAvailability(regionInfo, sn)
&& iterations++ < maxIterations);
{code}
in test cases since we have only 4 RS and on is killed among the remaining 3
what happens is till the maxIterations we keep getting the 2 RS only where
already replicas are present. So this makes the test case flaky. So in _2 patch
added some little intelligence like if we reach the maxIterations before doing
assigning just check if we have really used the complete list of servers. If
not try to make use of all the servers and then decide the server to be
assigned. In a big cluster may be this is not bound to happen but it is better
to have and also ensure that the test case is not flaky.
> ServerCrashProcedure should select target server where no other replicas
> exist for the current region
> -----------------------------------------------------------------------------------------------------
>
> Key: HBASE-21102
> URL: https://issues.apache.org/jira/browse/HBASE-21102
> Project: HBase
> Issue Type: Bug
> Components: Region Assignment
> Affects Versions: 3.0.0
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Priority: Major
> Attachments: HBASE-21102_1.patch, HBASE-21102_2.patch,
> HBASE-21102_initial.patch
>
>
> Currently when a server with region replica crashes, when the target server
> is created for the replica region assignment there is no guarentee that a
> server is selected where there is no other replica for the current region
> getting assigned. It so happens that currently we do an assignment randomly
> and later the LB comes and identifies these cases and again does MOVE for
> such regions. It will be better if we can identify target servers at least
> minimally ensuring that replicas are not colocated.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)