[ 
https://issues.apache.org/jira/browse/HBASE-15058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-15058:
---------------------------
    Description: 
When region split doesn't pass quota check, we would see exception similar to 
the following:
{code}
2015-12-29 16:07:33,653 INFO  [RS:0;10.21.128.189:57449-splits-1451434041585] 
regionserver.SplitRequest(97): Running rollback/cleanup of failed split of np2: 
                    
testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.;
 Failed to get ok from master to split 
np2:testRegionNormalizationSplitOnCluster,     
zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
java.io.IOException: Failed to get ok from master to split 
np2:testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
  at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:345)
  at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:262)
  at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:502)
  at 
org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
  at 
org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:155)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
{code}
However, region split may fail for subsequent SplitTransactionPhase's in 
stepsBeforePONR().
Currently in branch-1, the distinction among the following states is not clear 
in AssignmentManager#onRegionTransition():
{code}
    case SPLIT_PONR:
    case SPLIT:
    case SPLIT_REVERTED:
      errorMsg =
          onRegionSplit(serverName, code, hri, 
HRegionInfo.convert(transition.getRegionInfo(1)),
            HRegionInfo.convert(transition.getRegionInfo(2)));
      if (org.apache.commons.lang.StringUtils.isEmpty(errorMsg)) {
        try {
          regionStateListener.onRegionSplitReverted(hri);
{code}
onRegionSplit() handles the above 3 TransitionCode's. However, errorMsg is 
normally null (onRegionSplit returns null at the end).
This would result in onRegionSplitReverted() being called for cases of 
SPLIT_PONR and SPLIT.

When region split fails, AssignmentManager#onRegionTransition() should account 
for the failure properly so that quota bookkeeping is consistent.

  was:
When region split doesn't pass quota check, we would see exception similar to 
the following:
{code}
2015-12-29 16:07:33,653 INFO  [RS:0;10.21.128.189:57449-splits-1451434041585] 
regionserver.SplitRequest(97): Running rollback/cleanup of failed split of np2: 
                    
testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.;
 Failed to get ok from master to split 
np2:testRegionNormalizationSplitOnCluster,     
zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
java.io.IOException: Failed to get ok from master to split 
np2:testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
  at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:345)
  at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:262)
  at 
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:502)
  at 
org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
  at 
org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:155)
  at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
{code}
However, region split may fail for subsequent SplitTransactionPhase's in 
stepsBeforePONR().
Currently there is no mechanism to rollback the update to namespace quota.

When region split fails, NamespaceAuditor should account for the failure so 
that quota bookkeeping is consistent.


> AssignmentManager should account for unsuccessful split correctly which 
> initially passes quota check
> ----------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-15058
>                 URL: https://issues.apache.org/jira/browse/HBASE-15058
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Ted Yu
>
> When region split doesn't pass quota check, we would see exception similar to 
> the following:
> {code}
> 2015-12-29 16:07:33,653 INFO  [RS:0;10.21.128.189:57449-splits-1451434041585] 
> regionserver.SplitRequest(97): Running rollback/cleanup of failed split of 
> np2:                     
> testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.;
>  Failed to get ok from master to split 
> np2:testRegionNormalizationSplitOnCluster,     
> zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
> java.io.IOException: Failed to get ok from master to split 
> np2:testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
>   at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:345)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:262)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:502)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
>   at 
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:155)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> {code}
> However, region split may fail for subsequent SplitTransactionPhase's in 
> stepsBeforePONR().
> Currently in branch-1, the distinction among the following states is not 
> clear in AssignmentManager#onRegionTransition():
> {code}
>     case SPLIT_PONR:
>     case SPLIT:
>     case SPLIT_REVERTED:
>       errorMsg =
>           onRegionSplit(serverName, code, hri, 
> HRegionInfo.convert(transition.getRegionInfo(1)),
>             HRegionInfo.convert(transition.getRegionInfo(2)));
>       if (org.apache.commons.lang.StringUtils.isEmpty(errorMsg)) {
>         try {
>           regionStateListener.onRegionSplitReverted(hri);
> {code}
> onRegionSplit() handles the above 3 TransitionCode's. However, errorMsg is 
> normally null (onRegionSplit returns null at the end).
> This would result in onRegionSplitReverted() being called for cases of 
> SPLIT_PONR and SPLIT.
> When region split fails, AssignmentManager#onRegionTransition() should 
> account for the failure properly so that quota bookkeeping is consistent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to