[
https://issues.apache.org/jira/browse/HBASE-19681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313972#comment-16313972
]
Anirban Roy commented on HBASE-19681:
-------------------------------------
Also see the following exception in region server during compaction -
2018-01-05 13:31:55,910 ERROR
[regionserver/ip-10-0-1-237.ec2.internal/10.0.1.237:16020-longCompactions-1508372592608]
regionserver.CompactSplitThread: Compaction selection failed Store = d, pri = 5
java.io.FileNotFoundException: File does not exist:
hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/396a31774fbb8b8ed1020850e6035973/d/4a46f33587ae43d2986cbf0e45379c83
at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1309)
at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:431)
at
org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:342)
at
org.apache.hadoop.hbase.regionserver.StoreFileInfo.getFileStatus(StoreFileInfo.java:355)
at
org.apache.hadoop.hbase.regionserver.StoreFileInfo.getModificationTime(StoreFileInfo.java:360)
at
org.apache.hadoop.hbase.regionserver.StoreFile.getModificationTimeStamp(StoreFile.java:321)
at
org.apache.hadoop.hbase.regionserver.StoreUtils.getLowestTimestamp(StoreUtils.java:63)
at
org.apache.hadoop.hbase.regionserver.compactions.RatioBasedCompactionPolicy.shouldPerformMajorCompaction(RatioBasedCompactionPolicy.java:64)
at
org.apache.hadoop.hbase.regionserver.compactions.SortedCompactionPolicy.selectCompaction(SortedCompactionPolicy.java:82)
at
org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$DefaultCompactionContext.select(DefaultStoreEngine.java:107)
at
org.apache.hadoop.hbase.regionserver.HStore.requestCompaction(HStore.java:1661)
at
org.apache.hadoop.hbase.regionserver.CompactSplitThread.selectCompaction(CompactSplitThread.java:369)
at
org.apache.hadoop.hbase.regionserver.CompactSplitThread.access$100(CompactSplitThread.java:59)
at
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.doCompaction(CompactSplitThread.java:494)
at
org.apache.hadoop.hbase.regionserver.CompactSplitThread$CompactionRunner.run(CompactSplitThread.java:564)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
> Online snapshot creation failing with missing store file
> --------------------------------------------------------
>
> Key: HBASE-19681
> URL: https://issues.apache.org/jira/browse/HBASE-19681
> Project: HBase
> Issue Type: Bug
> Components: backup&restore, Performance, scaling, snapshots
> Affects Versions: 1.3.0
> Environment: Hadoop - 2.7.3
> HBase 1.3.0
> OS - GNU/Linux x86_64
> Cluster - Amazon Elastic Mapreduce
> Reporter: Anirban Roy
> Attachments: region-server-missing file-log.doc,
> region-server-snapshot-exception-log.doc
>
>
> We are facing problem creating online snapshot of our HBase table. The table
> contains 20TB data and receiving ~10000 writes per second. The snapshot
> creating failing intermittently with error that some hfile missing, see the
> detailed output below. Once we locate the region server hosting the region
> and restart the region server, snapshot creation succeeds. It seems the
> missing hfile removed due to minor compaction, but region server still holds
> the pointer to the file.
> [hadoop@ip-10-0-12-164 ~]$ hbase shell
> HBase Shell; enter 'help<RETURN>' for list of supported commands.
> Type "exit<RETURN>" to leave the HBase Shell
> Version 1.3.0, rUnknown, Fri Feb 17 18:15:07 UTC 2017
>
> hbase(main):001:0> snapshot ‘x_table’, ‘x_snapshot’
>
> ERROR: org.apache.hadoop.hbase.snapshot.HBaseSnapshotException: Snapshot {
> ss=x_snapshot table=x_table type=FLUSH } had an error. Procedure x_snapshot
> { waiting=[] done=[ip-10-0-9-31.ec2.internal,16020,1508372578254,
> ip-10-0-0-32.ec2.internal,16020,1508372591059,
> ip-10-0-14-221.ec2.internal,16020,1508372580873,
> ip-10-0-15-185.ec2.internal,16020,1508372588507,
> ip-10-0-9-43.ec2.internal,16020,1508372569107,
> ip-10-0-10-62.ec2.internal,16020,1512885921693,
> ip-10-0-8-216.ec2.internal,16020,1508372584133,
> ip-10-0-1-207.ec2.internal,16020,1508372580144,
> ip-10-0-0-173.ec2.internal,16020,1508372584969,
> ip-10-0-4-79.ec2.internal,16020,1508372587161,
> ip-10-0-3-165.ec2.internal,16020,1508372593566,
> ip-10-0-14-137.ec2.internal,16020,1508372583225,
> ip-10-0-6-33.ec2.internal,16020,1508372581587,
> ip-10-0-15-199.ec2.internal,16020,1508372587478,
> ip-10-0-5-253.ec2.internal,16020,1508372581243,
> ip-10-0-1-99.ec2.internal,16020,1508372609684] }
> at
> org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:354)
> at
> org.apache.hadoop.hbase.master.MasterRpcServices.isSnapshotDone(MasterRpcServices.java:1058)
> at
> org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:61089)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2328)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)
> Caused by:
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable via
> ip-10-0-3-13.ec2.internal,16020,1508372563772:org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
> java.io.FileNotFoundException: File does not exist:
> hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523
> at
> org.apache.hadoop.hbase.errorhandling.ForeignExceptionDispatcher.rethrowException(ForeignExceptionDispatcher.java:83)
> at
> org.apache.hadoop.hbase.master.snapshot.TakeSnapshotHandler.rethrowExceptionIfFailed(TakeSnapshotHandler.java:315)
> at
> org.apache.hadoop.hbase.master.snapshot.SnapshotManager.isSnapshotDone(SnapshotManager.java:344)
> ... 6 more
> Caused by:
> org.apache.hadoop.hbase.errorhandling.ForeignException$ProxyThrowable:
> java.io.FileNotFoundException: File does not exist:
> hdfs://ip-10-0-12-164.ec2.internal:8020/user/hbase/data/default/x_table/ecbb3aeaf7c5b1f65742deab5812362c/d/f76d8827c29244b99bf9344982956523
> at
> org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:347)
> at
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:140)
> at
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:160)
> at
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:187)
> at
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
>
> Here is some help for this command:
> Take a snapshot of specified table. Examples:
>
> hbase> snapshot 'sourceTable', 'snapshotName'
> hbase> snapshot 'namespace:sourceTable', 'snapshotName', {SKIP_FLUSH =>
> true}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)