[
https://issues.apache.org/jira/browse/HBASE-18215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055201#comment-16055201
]
chenxu commented on HBASE-18215:
--------------------------------
bq. Can you provide a specific scenario?
there is a scenario
{code:title=RSGroupBasedLoadBalancer.java|borderStyle=solid}
public List<RegionPlan> balanceCluster(Map<ServerName, List<HRegionInfo>>
clusterState)
throws HBaseIOException {
...
List<HRegionInfo> misplacedRegions =
correctedState.get(LoadBalancer.BOGUS_SERVER_NAME);
for (HRegionInfo regionInfo : misplacedRegions) {
regionPlans.add(new RegionPlan(regionInfo, null, null));
}
...
{code}
if region misplaced, RegionPlan’s dest is null, and the following will happen
{code:title=AssignmentManager.java|borderStyle=solid}
private RegionPlan getRegionPlan(final HRegionInfo region,
final boolean forceNewPlan) throws HBaseIOException {
...
if (forceNewPlan
|| existingPlan == null
|| existingPlan.getDestination() == null
|| !destServers.contains(existingPlan.getDestination())) {
newPlan = true;
try {
randomPlan = new RegionPlan(region, null,
balancer.randomAssignment(region, destServers));
} catch (IOException ex) {
LOG.warn("Failed to create new plan.",ex);
return null;
}
this.regionPlans.put(encodedName, randomPlan);
}
}
if (newPlan) {
if (randomPlan.getDestination() == null) {
LOG.warn("Can't find a destination for " + encodedName);
return null;
}
...
return randomPlan;
}
...
}
{code}
if balancer.randomAssignment return null, getRegionPlan will null, AM will
handle like this
{code:title=AssignmentManager.java|borderStyle=solid}
private void assign(RegionState state, boolean forceNewPlan) {
...
if (plan == null) {
LOG.warn("Unable to determine a plan to assign " + region);
// For meta region, we have to keep retrying until succeeding
if (region.isMetaRegion()) {
....
}
regionStates.updateRegionState(region, State.FAILED_OPEN);
return;
}
....
}
{code}
the target region will be transition to FAILED_OPEN status.
but if balancer.randomAssignment return BOGUS_SERVER_NAME, AM can’t handle this.
an OPEN RPC will be sent to the BOGUS_SERVER_NAME, this should not happen
> some advises about refactoring of rsgroup
> -----------------------------------------
>
> Key: HBASE-18215
> URL: https://issues.apache.org/jira/browse/HBASE-18215
> Project: HBase
> Issue Type: Improvement
> Components: Balancer
> Reporter: chenxu
> Attachments: HBASE-18215-1.2.4-v1.patch
>
>
> recently we have Integrated rsgroup into our cluster, after Integrated,
> found some refactoring points. maybe the points were not right, but i think
> there is a need to share with you guys.
> # when hbase.balancer.tablesOnMaster configured, RSGroupBasedLoadBalancer
> should consider masterServer assignment first in balanceCluster,
> roundRobinAssignment, retainAssignment and randomAssignment
> do the same thing as BaseLoadBalancer
> # why not use a local file as the persistence layer instead of rsgroup table.
> in our implementation, we first modify the local rsgroup file, then load the
> group info into memory, after that execute the balancer command, everything
> is OK.
> when loading do some sanity check:
> (1) one server can not be owned by multi group
> (2) one table can not be owned by multi group
> (3) if group has table, it must also has servers
> (4) default group must has servers in it
> if sanity check can’t pass, give up the following process.work as this, it
> can greatly reduce the complexity of rsgroup implementation, there is no need
> to wait for the rsgroup table to be online, and methods like moveServers,
> moveTables, addRSGroup, removeRSGroup, moveServersAndTables can be removed
> from RSGroupAdminService.only a refresh method is need(modify persistence
> layer first and refresh the memory)
> # we should add some group informations on master web UI
> to do this, RSGroupBasedLoadBalancer should move to hbase-server module,
> because MasterStatusTmpl.jamon depends on it
> # there may be some issues about RSGroupBasedLoadBalancer.roundRobinAssignment
> if two groups both include BOGUS_SERVER_NAME, assignments.putAll will
> overwrite the previous data
> # there may be some issues about RSGroupBasedLoadBalancer.randomAssignment
> when the return value is BOGUS_SERVER_NAME, AM can not handle this case. we
> should return null value instead of BOGUS_SERVER_NAME.
> # when RSGroupBasedLoadBalancer.balanceCluster execute, groups are balanced
> one by one, if there are two many groups, we can do this in parallel.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)