[
https://issues.apache.org/jira/browse/HBASE-10136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Aleksandr Shulman updated HBASE-10136:
--------------------------------------
Description:
Expected behavior:
A user can issue a request for a snapshot of a table while that table is
undergoing an online schema change and expect that snapshot request to complete
correctly. Also, the same is true if a user issues a online schema change
request while a snapshot attempt is ongoing.
Observed behavior:
Snapshot attempts time out when there is an ongoing online schema change
because the region is closed and opened during the snapshot.
As a side-note, I would expect that the attempt should fail quickly as opposed
to timing out.
Further, what I have seen is that subsequent attempts to snapshot the table
fail because of some state/cleanup issues. This is also concerning.
Immediate error:
{code}type=FLUSH }' is still in progress!
2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11)
Sleeping: 10000ms while waiting for snapshot completion.
2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting
current status of snapshot from master...
2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3]
master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0
table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done
2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3]
snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0
table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in
progress!
Snapshot failure occurred
org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot
'snapshot0' wasn't completed in expectedTime:60000 ms
at
org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713)
at
org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638)
at
org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602)
at
org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code}
Likely root cause of error:
{code}Exception in SnapshotSubprocedurePool
java.util.concurrent.ExecutionException:
org.apache.hadoop.hbase.NotServingRegionException:
changeSchemaDuringSnapshot1386806258640,77777777,1386806258720.ea776db51749e39c956d771a7d17a0f3.
is closing
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137)
at
org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181)
at
org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.hbase.NotServingRegionException:
changeSchemaDuringSnapshot1386806258640,77777777,1386806258720.ea776db51749e39c956d771a7d17a0f3.
is closing
at
org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327)
at
org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
... 5 more{code}
was:
Expected behavior:
A user can take a snapshot of a table while that table is undergoing an online
schema change.
Observed behavior:
Snapshot attempts time out when there is an ongoing online schema change
because the region is closed and opened during the snapshot.
As a side-note, I would expect that the attempt should fail quickly as opposed
to timing out.
Further, what I have seen is that subsequent attempts to snapshot the table
fail because of some state/cleanup issues. This is also concerning.
Immediate error:
{code}type=FLUSH }' is still in progress!
2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11)
Sleeping: 10000ms while waiting for snapshot completion.
2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting
current status of snapshot from master...
2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3]
master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0
table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done
2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3]
snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0
table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in
progress!
Snapshot failure occurred
org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot
'snapshot0' wasn't completed in expectedTime:60000 ms
at
org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713)
at
org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638)
at
org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602)
at
org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code}
Likely root cause of error:
{code}Exception in SnapshotSubprocedurePool
java.util.concurrent.ExecutionException:
org.apache.hadoop.hbase.NotServingRegionException:
changeSchemaDuringSnapshot1386806258640,77777777,1386806258720.ea776db51749e39c956d771a7d17a0f3.
is closing
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at
org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137)
at
org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181)
at
org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.hbase.NotServingRegionException:
changeSchemaDuringSnapshot1386806258640,77777777,1386806258720.ea776db51749e39c956d771a7d17a0f3.
is closing
at
org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327)
at
org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
... 5 more{code}
> [Online Schema Change]: Online Schema Change on a table conflicts with
> concurrent snapshot attempt on the table
> ---------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-10136
> URL: https://issues.apache.org/jira/browse/HBASE-10136
> Project: HBase
> Issue Type: Bug
> Components: snapshots
> Affects Versions: 0.96.0, 0.98.1, 0.99.0
> Reporter: Aleksandr Shulman
> Assignee: Matteo Bertozzi
> Labels: online_schema_change
>
> Expected behavior:
> A user can issue a request for a snapshot of a table while that table is
> undergoing an online schema change and expect that snapshot request to
> complete correctly. Also, the same is true if a user issues a online schema
> change request while a snapshot attempt is ongoing.
> Observed behavior:
> Snapshot attempts time out when there is an ongoing online schema change
> because the region is closed and opened during the snapshot.
> As a side-note, I would expect that the attempt should fail quickly as
> opposed to timing out.
> Further, what I have seen is that subsequent attempts to snapshot the table
> fail because of some state/cleanup issues. This is also concerning.
> Immediate error:
> {code}type=FLUSH }' is still in progress!
> 2013-12-11 15:58:32,883 DEBUG [Thread-385] client.HBaseAdmin(2696): (#11)
> Sleeping: 10000ms while waiting for snapshot completion.
> 2013-12-11 15:58:42,884 DEBUG [Thread-385] client.HBaseAdmin(2704): Getting
> current status of snapshot from master...
> 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3]
> master.HMaster(2891): Checking to see if snapshot from request:{ ss=snapshot0
> table=changeSchemaDuringSnapshot1386806258640 type=FLUSH } is done
> 2013-12-11 15:58:42,887 DEBUG [FifoRpcScheduler.handler1-thread-3]
> snapshot.SnapshotManager(374): Snapshoting '{ ss=snapshot0
> table=changeSchemaDuringSnapshot1386806258640 type=FLUSH }' is still in
> progress!
> Snapshot failure occurred
> org.apache.hadoop.hbase.snapshot.SnapshotCreationException: Snapshot
> 'snapshot0' wasn't completed in expectedTime:60000 ms
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2713)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2638)
> at
> org.apache.hadoop.hbase.client.HBaseAdmin.snapshot(HBaseAdmin.java:2602)
> at
> org.apache.hadoop.hbase.client.TestAdmin$BackgroundSnapshotThread.run(TestAdmin.java:1974){code}
> Likely root cause of error:
> {code}Exception in SnapshotSubprocedurePool
> java.util.concurrent.ExecutionException:
> org.apache.hadoop.hbase.NotServingRegionException:
> changeSchemaDuringSnapshot1386806258640,77777777,1386806258720.ea776db51749e39c956d771a7d17a0f3.
> is closing
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at
> org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:314)
> at
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:118)
> at
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:137)
> at
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:181)
> at
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:1)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.hadoop.hbase.NotServingRegionException:
> changeSchemaDuringSnapshot1386806258640,77777777,1386806258720.ea776db51749e39c956d771a7d17a0f3.
> is closing
> at
> org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5327)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.startRegionOperation(HRegion.java:5289)
> at
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:79)
> at
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:1)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> ... 5 more{code}
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)