[ 
https://issues.apache.org/jira/browse/HBASE-21102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16596182#comment-16596182
 ] 

ramkrishna.s.vasudevan commented on HBASE-21102:
------------------------------------------------

Found the reason for this and the fix seems to be simple. 
When the processAssignQueue() happens for the regions on the crashed server
{code}
    for (RegionStateNode regionStateNode: regions.values()) {
      boolean sysTable = regionStateNode.isSystemTable();
      final List<RegionInfo> hris = sysTable? systemHRIs: userHRIs;
      if (regionStateNode.getRegionLocation() != null) {
        retainMap.put(regionStateNode.getRegionInfo(), 
regionStateNode.getRegionLocation());
      } else {
        hris.add(regionStateNode.getRegionInfo());
      }
    }
{code}
We will endup in creating the regionMap because all the regions have a location 
(that is the crashed server).
Since we now have a retainMap we will try to go with the retainAssignment() 
flow in the LB.
{code}
    if (retainMap != null && !retainMap.isEmpty()) {
      if (isTraceEnabled) {
        LOG.trace("retain assign regions=" + retainMap);
      }
      try {
        acceptPlan(regions, balancer.retainAssignment(retainMap, servers));
      } catch (HBaseIOException e) {
        LOG.warn("unable to retain assignment", e);
        addToPendingAssignment(regions, retainMap.keySet());
      }
    }
{code}
The 'servers' passed here is the list of online servers at that point of time.
Inside retainAssignment() generally where we don have same host hosting 
multiservers, we end up here
{code}
    // If servers from prior assignment aren't present, then lets do 
randomAssignment on regions.
    if (randomAssignRegions.size() > 0) {
      Cluster cluster = createCluster(servers, regions.keySet());
      for (Map.Entry<ServerName, List<RegionInfo>> entry : 
assignments.entrySet()) {
        ServerName sn = entry.getKey();
        for (RegionInfo region : entry.getValue()) {
          cluster.doAssignRegion(region, sn);
        }
      }
      for (RegionInfo region : randomAssignRegions) {
        ServerName target = randomAssignment(cluster, region, servers);
        assignments.get(target).add(region);
        cluster.doAssignRegion(region, target);
        numRandomAssignments++;
      }
    }
{code}
Note the createcluster() call where we pass the current set of regions to be 
assigned.
Inside createCluster()
{code}
  protected Cluster createCluster(List<ServerName> servers, 
Collection<RegionInfo> regions) {
    // Get the snapshot of the current assignments for the regions in question, 
and then create
    // a cluster out of it. Note that we might have replicas already assigned 
to some servers
    // earlier. So we want to get the snapshot to see those assignments, but 
this will only contain
    // replicas of the regions that are passed (for performance).
    Map<ServerName, List<RegionInfo>> clusterState = 
getRegionAssignmentsByServer(regions);

    for (ServerName server : servers) {
      if (!clusterState.containsKey(server)) {
        clusterState.put(server, EMPTY_REGION_LIST);
      }
    }
    return new Cluster(regions, clusterState, null, this.regionFinder,
        rackManager);
  }
{code}
So the Cluster that we create has the regions to be assigned and the cluster 
state is again formed with same set of regions to be assigned and for the 
actual live servers we create an empty list. In a case where there are no 
region replicas probably this is enough as we have the servers to be assigned 
and the region's list but in case of replicas this cluster state does not give 
an indication as to where the replica regions are located. So we end up in 
randomly assigning these regions (including replicas) to the available live 
servers without minding the replica colocation.

The fix here seems to be simple when we createCluster() - create the 
cllusterState Map with all the existing regions and its servers and then pass 
on the current set of regions as 'unassignedRegions' to the Cluster constructor
{code}
protected Cluster(
        Collection<RegionInfo> unassignedRegions,
        Map<ServerName, List<RegionInfo>> clusterState,
        Map<String, Deque<BalancerRegionLoad>> loads,
        RegionLocationFinder regionFinder,
        RackManager rackManager) {
{code}
that solves the problem because when there is a region replica we get a 
wholistic picture of the cluster and then we do check internally if we can 
assign based on the replica region location. 


> ServerCrashProcedure should select target server where no other replicas 
> exist for the current region
> -----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-21102
>                 URL: https://issues.apache.org/jira/browse/HBASE-21102
>             Project: HBase
>          Issue Type: Bug
>          Components: Region Assignment
>    Affects Versions: 3.0.0
>            Reporter: ramkrishna.s.vasudevan
>            Assignee: ramkrishna.s.vasudevan
>            Priority: Major
>
> Currently when a server with region replica crashes, when the target server 
> is created for the replica region assignment there is no guarentee that a 
> server is selected where there is no other replica for the current region 
> getting assigned. It so happens that currently we do an assignment randomly 
> and later the LB comes and identifies these cases and again does MOVE for 
> such regions. It will be better if we can identify target servers at least 
> minimally ensuring that replicas are not colocated.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to