[
https://issues.apache.org/jira/browse/HBASE-15058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Yu updated HBASE-15058:
---------------------------
Description:
When region split doesn't pass quota check, we would see exception similar to
the following:
{code}
2015-12-29 16:07:33,653 INFO [RS:0;10.21.128.189:57449-splits-1451434041585]
regionserver.SplitRequest(97): Running rollback/cleanup of failed split of np2:
testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.;
Failed to get ok from master to split
np2:testRegionNormalizationSplitOnCluster,
zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
java.io.IOException: Failed to get ok from master to split
np2:testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
at
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:345)
at
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:262)
at
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:502)
at
org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
at
org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:155)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
{code}
However, region split may fail for subsequent SplitTransactionPhase's in
stepsBeforePONR().
Currently in branch-1, the distinction among the following TransitionCode's is
not clear in AssignmentManager#onRegionTransition():
{code}
case SPLIT_PONR:
case SPLIT:
case SPLIT_REVERTED:
errorMsg =
onRegionSplit(serverName, code, hri,
HRegionInfo.convert(transition.getRegionInfo(1)),
HRegionInfo.convert(transition.getRegionInfo(2)));
if (org.apache.commons.lang.StringUtils.isEmpty(errorMsg)) {
try {
regionStateListener.onRegionSplitReverted(hri);
{code}
onRegionSplit() handles the above 3 TransitionCode's. However, errorMsg is
normally null (onRegionSplit returns null at the end).
This would result in onRegionSplitReverted() being called for cases of
SPLIT_PONR and SPLIT.
When region split fails, AssignmentManager#onRegionTransition() should account
for the failure properly so that quota bookkeeping is consistent.
was:
When region split doesn't pass quota check, we would see exception similar to
the following:
{code}
2015-12-29 16:07:33,653 INFO [RS:0;10.21.128.189:57449-splits-1451434041585]
regionserver.SplitRequest(97): Running rollback/cleanup of failed split of np2:
testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.;
Failed to get ok from master to split
np2:testRegionNormalizationSplitOnCluster,
zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
java.io.IOException: Failed to get ok from master to split
np2:testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
at
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:345)
at
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:262)
at
org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:502)
at
org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
at
org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:155)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
{code}
However, region split may fail for subsequent SplitTransactionPhase's in
stepsBeforePONR().
Currently in branch-1, the distinction among the following states is not clear
in AssignmentManager#onRegionTransition():
{code}
case SPLIT_PONR:
case SPLIT:
case SPLIT_REVERTED:
errorMsg =
onRegionSplit(serverName, code, hri,
HRegionInfo.convert(transition.getRegionInfo(1)),
HRegionInfo.convert(transition.getRegionInfo(2)));
if (org.apache.commons.lang.StringUtils.isEmpty(errorMsg)) {
try {
regionStateListener.onRegionSplitReverted(hri);
{code}
onRegionSplit() handles the above 3 TransitionCode's. However, errorMsg is
normally null (onRegionSplit returns null at the end).
This would result in onRegionSplitReverted() being called for cases of
SPLIT_PONR and SPLIT.
When region split fails, AssignmentManager#onRegionTransition() should account
for the failure properly so that quota bookkeeping is consistent.
> AssignmentManager should account for unsuccessful split correctly which
> initially passes quota check
> ----------------------------------------------------------------------------------------------------
>
> Key: HBASE-15058
> URL: https://issues.apache.org/jira/browse/HBASE-15058
> Project: HBase
> Issue Type: Bug
> Affects Versions: 1.2.0
> Reporter: Ted Yu
>
> When region split doesn't pass quota check, we would see exception similar to
> the following:
> {code}
> 2015-12-29 16:07:33,653 INFO [RS:0;10.21.128.189:57449-splits-1451434041585]
> regionserver.SplitRequest(97): Running rollback/cleanup of failed split of
> np2:
> testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.;
> Failed to get ok from master to split
> np2:testRegionNormalizationSplitOnCluster,
> zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
> java.io.IOException: Failed to get ok from master to split
> np2:testRegionNormalizationSplitOnCluster,zzzzz,1451434045065.27cccb3fae03002b8058beef61cb7c20.
> at
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.stepsBeforePONR(SplitTransactionImpl.java:345)
> at
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.createDaughters(SplitTransactionImpl.java:262)
> at
> org.apache.hadoop.hbase.regionserver.SplitTransactionImpl.execute(SplitTransactionImpl.java:502)
> at
> org.apache.hadoop.hbase.regionserver.SplitRequest.doSplitting(SplitRequest.java:82)
> at
> org.apache.hadoop.hbase.regionserver.SplitRequest.run(SplitRequest.java:155)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> {code}
> However, region split may fail for subsequent SplitTransactionPhase's in
> stepsBeforePONR().
> Currently in branch-1, the distinction among the following TransitionCode's
> is not clear in AssignmentManager#onRegionTransition():
> {code}
> case SPLIT_PONR:
> case SPLIT:
> case SPLIT_REVERTED:
> errorMsg =
> onRegionSplit(serverName, code, hri,
> HRegionInfo.convert(transition.getRegionInfo(1)),
> HRegionInfo.convert(transition.getRegionInfo(2)));
> if (org.apache.commons.lang.StringUtils.isEmpty(errorMsg)) {
> try {
> regionStateListener.onRegionSplitReverted(hri);
> {code}
> onRegionSplit() handles the above 3 TransitionCode's. However, errorMsg is
> normally null (onRegionSplit returns null at the end).
> This would result in onRegionSplitReverted() being called for cases of
> SPLIT_PONR and SPLIT.
> When region split fails, AssignmentManager#onRegionTransition() should
> account for the failure properly so that quota bookkeeping is consistent.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)