[ 
https://issues.apache.org/jira/browse/HBASE-19681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16412222#comment-16412222
 ] 

Saad Mufti commented on HBASE-19681:
------------------------------------

We are facing the exact same situation in HBase 1.4.0 on AWS EMR based HBase. 
Anyone have any potential recovery process? We haven't tried restart but we 
migrated the region using the "assign" command in the shell that moved the 
region but the problems persists. We have also seen the exception in both the 
snapshot thread and compaction thread.

> Online snapshot creation failing with missing store file
> --------------------------------------------------------
>
>                 Key: HBASE-19681
>                 URL: https://issues.apache.org/jira/browse/HBASE-19681
>             Project: HBase
>          Issue Type: Bug
>          Components: backup&restore, Performance, scaling, snapshots
>    Affects Versions: 1.3.0
>         Environment: Hadoop - 2.7.3
> HBase 1.3.0
> OS - GNU/Linux x86_64
> Cluster - Amazon Elastic Mapreduce
>            Reporter: Anirban Roy
>            Priority: Major
>         Attachments: region-server-missing file-log.doc, 
> region-server-snapshot-exception-log.doc
>
>
> We are facing problem creating online snapshot of our HBase table. The table 
> contains 20TB data and receiving ~10000 writes per second. The snapshot 
> creating failing intermittently with error that some hfile missing, see the 
> detailed output below. Once we locate the region server hosting the region 
> and restart the region server, snapshot creation succeeds. It seems the 
> missing hfile removed due to minor compaction, but region server still holds 
> the pointer to the file.
> [hadoop@ip-10-0-12-164 ~]$ hbase shell
> HBase Shell; enter 'help<RETURN>' for list of supported commands.
> Type "exit<RETURN>" to leave the HBase Shell
> Version 1.3.0, rUnknown, Fri Feb 17 18:15:07 UTC 2017
>  
> hbase(main):001:0> snapshot ‘x_table’, ‘x_snapshot’
>  
> ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot { 
> ss=x_snapshot table=x_table type=FLUSH } had an error.  Procedure x_snapshot 
> { waiting=[] done=[ip-10-0-9-31.ec2.internal,16020,1508372578254, 
> ip-10-0-0-32.ec2.internal,16020,1508372591059, 
> ip-10-0-14-221.ec2.internal,16020,1508372580873, 
> ip-10-0-15-185.ec2.internal,16020,1508372588507, 
> ip-10-0-9-43.ec2.internal,16020,1508372569107, 
> ip-10-0-10-62.ec2.internal,16020,1512885921693, 
> ip-10-0-8-216.ec2.internal,16020,1508372584133, 
> ip-10-0-1-207.ec2.internal,16020,1508372580144, 
> ip-10-0-0-173.ec2.internal,16020,1508372584969, 
> ip-10-0-4-79.ec2.internal,16020,1508372587161, 
> ip-10-0-3-165.ec2.internal,16020,1508372593566, 
> ip-10-0-14-137.ec2.internal,16020,1508372583225, 
> ip-10-0-6-33.ec2.internal,16020,1508372581587, 
> ip-10-0-15-199.ec2.internal,16020,1508372587478, 
> ip-10-0-5-253.ec2.internal,16020,1508372581243, 
> ip-10-0-1-99.ec2.internal,16020,1508372609684] }
>         at 
> org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:354)
>         at 
> org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1058)
>         at 
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:61089)
>         at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2328)
>         at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
>         at 
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> Caused by: 
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via 
> ip-10-0-3-13.ec2.internal,16020,1508372563772:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
>  java.io.FileNotFoundException: File does not exist: 
> hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523
>         at 
> org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
>         at 
> org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:315)
>         at 
> org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:344)
>         ... 6 more
> Caused by: 
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable: 
> java.io.FileNotFoundException: File does not exist: 
> hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523
>         at 
> org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:347)
>         at 
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:140)
>         at 
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:160)
>         at 
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:187)
>         at 
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>  
> Here is some help for this command:
> Take a snapshot of specified table. Examples:
>  
>   hbase> snapshot 'sourceTable', 'snapshotName'
>   hbase> snapshot 'namespace:sourceTable', 'snapshotName', {SKIP_FLUSH => 
> true}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to