Ted Yu created HBASE-11282:
------------------------------
Summary: Load balancer may move a region which is participating in
snapshot
Key: HBASE-11282
URL: https://issues.apache.org/jira/browse/HBASE-11282
Project: HBase
Issue Type: Bug
Reporter: Ted Yu
The region was tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7.
>From master log:
{code}
2014-03-10 23:48:09,035 DEBUG [AM.ZK.Worker-pool2-t42]
master.AssignmentManager: Found an existing plan for
tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7. destination
server is
h2-ubuntu12-sec-1394425849-hbase-4.cs1cloud.internal,60020,1394494963812
accepted as a dest server = true
2014-03-10 23:48:09,035 DEBUG [AM.ZK.Worker-pool2-t42]
master.AssignmentManager: Using pre-existing plan for
tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7.;
plan=hri=tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7.,
src=h2-ubuntu12-sec-1394425849-hbase-9.cs1cloud.internal,60020,1394494962165,
dest=h2-ubuntu12-sec-
1394425849-hbase-4.cs1cloud.internal,60020,1394494963812
2014-03-10 23:48:09,035 INFO [AM.ZK.Worker-pool2-t42] master.RegionStates:
Transitioned {289ebdee6adf0a3b9c2bbcbe2ff522e7 state=CLOSED, ts=1394495289035,
server=h2-
ubuntu12-sec-1394425849-hbase-9.cs1cloud.internal,60020,1394494962165} to
{289ebdee6adf0a3b9c2bbcbe2ff522e7 state=OFFLINE, ts=1394495289035,
server=h2-ubuntu12-sec-
1394425849-hbase-9.cs1cloud.internal,60020,1394494962165}
2014-03-10 23:48:09,035 DEBUG [AM.ZK.Worker-pool2-t42] zookeeper.ZKAssign:
master:60000-0x244aa9920190b04,
quorum=h2-ubuntu12-sec-1394425849-hbase-8.cs1cloud.internal:2181,h2-ubuntu12-sec-1394425849-hbase-1.cs1cloud.internal:2181,h2-ubuntu12-sec-1394425849-hbase-4.cs1cloud.internal:2181,
baseZNode=/hbase Creating (or updating) unassigned node
289ebdee6adf0a3b9c2bbcbe2ff522e7 with OFFLINE state
2014-03-10 23:48:09,044 INFO [AM.ZK.Worker-pool2-t42]
master.AssignmentManager: Assigning
tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7. to h2-ubuntu12-sec-
1394425849-hbase-4.cs1cloud.internal,60020,1394494963812
{code}
>From hbase-hbase-regionserver-h2-ubuntu12-sec-1394425849-hbase-9.log :
{code}
2014-03-10 23:48:08,487 WARN [member:
'h2-ubuntu12-sec-1394425849-hbase-9.cs1cloud.internal,60020,1394494962165'
subprocedure-pool1-thread-1] snapshot.
RegionServerSnapshotManager: Got Exception in SnapshotSubprocedurePool
java.util.concurrent.ExecutionException:
org.apache.hadoop.hbase.NotServingRegionException:
tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7. is closing
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:325)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137)
at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181)
at org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:52)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.hbase.NotServingRegionException:
tableone,,1394495094967.289ebdee6adf0a3b9c2bbcbe2ff522e7. is closing
at
org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5699)
at
org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5663)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:65)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
{code}
Load balancer's move of the underlying region caused FlushSnapshotSubprocedure
to fail.
Mechanism of making load balancer be aware of region operation is desirable
such that snapshot doesn't fail due to the above scenario.
--
This message was sent by Atlassian JIRA
(v6.2#6252)