Anirban Roy created HBASE-19681:
-----------------------------------

             Summary: Online snapshot creation failing with missing store file
                 Key: HBASE-19681
                 URL: https://issues.apache.org/jira/browse/HBASE-19681
             Project: HBase
          Issue Type: Bug
          Components: backup&restore, snapshots
    Affects Versions: 1.3.0
         Environment: Hadoop - 2.7.3
HBase 1.3.0
OS - GNU/Linux x86_64
Cluster - Amazon Elastic Mapreduce
            Reporter: Anirban Roy


We are facing problem creating online snapshot of our HBase table. The table 
contains 20TB data and receiving ~10000 writes per second. The snapshot 
creating failing intermittently with error that some hfile missing, see the 
detailed output below. Once we locate the region server hosting the region and 
restart the region server, snapshot creation succeeds. It seems the missing 
hfile removed due to minor compaction, but region server still holds the 
pointer to the file.

[hadoop@ip-10-0-12-164 ~]$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 1.3.0, rUnknown, Fri Feb 17 18:15:07 UTC 2017
 
hbase(main):001:0> snapshot ‘x_table’, ‘x_snapshot’
 
ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
ss=x_snapshot table=x_table type=FLUSH } had an error.  Procedure x_snapshot { 
waiting=[] done=[ip-10-0-9-31.ec2.internal,16020,1508372578254, 
ip-10-0-0-32.ec2.internal,16020,1508372591059, 
ip-10-0-14-221.ec2.internal,16020,1508372580873, 
ip-10-0-15-185.ec2.internal,16020,1508372588507, 
ip-10-0-9-43.ec2.internal,16020,1508372569107, 
ip-10-0-10-62.ec2.internal,16020,1512885921693, 
ip-10-0-8-216.ec2.internal,16020,1508372584133, 
ip-10-0-1-207.ec2.internal,16020,1508372580144, 
ip-10-0-0-173.ec2.internal,16020,1508372584969, 
ip-10-0-4-79.ec2.internal,16020,1508372587161, 
ip-10-0-3-165.ec2.internal,16020,1508372593566, 
ip-10-0-14-137.ec2.internal,16020,1508372583225, 
ip-10-0-6-33.ec2.internal,16020,1508372581587, 
ip-10-0-15-199.ec2.internal,16020,1508372587478, 
ip-10-0-5-253.ec2.internal,16020,1508372581243, 
ip-10-0-1-99.ec2.internal,16020,1508372609684] }
        at 
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:354)
        at 
org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1058)
        at 
org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:61089)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2328)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
        at 
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
Caused by: 
org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via 
ip-10-0-3-13.ec2.internal,16020,1508372563772:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
 java.io.FileNotFoundException: File does not exist: 
hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523
        at 
org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
        at 
org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:315)
        at 
org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:344)
        ... 6 more
Caused by: 
org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
java.io.FileNotFoundException: File does not exist: 
hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523
        at 
org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:347)
        at 
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:140)
        at 
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:160)
        at 
org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:187)
        at 
org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
 
Here is some help for this command:
Take a snapshot of specified table. Examples:
 
  hbase> snapshot 'sourceTable', 'snapshotName'
  hbase> snapshot 'namespace:sourceTable', 'snapshotName', {SKIP_FLUSH => true}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to