[
https://issues.apache.org/jira/browse/HBASE-27349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
wuchang updated HBASE-27349:
----------------------------
Description:
We have the exactly the same issue with
https://issues.apache.org/jira/browse/HBASE-13651:
* The SCAN will got FNFE after RS got Full GC and transmitted and opened in
another RS.
* During which ,taking snapshot will also report FNFE
* Issue could be resolved by move the problem region manually.
We find that the HBASE-13651 is reverted afterwards by
https://issues.apache.org/jira/browse/HBASE-18786 since they thought it is not
a problem anymore with the comment in HBASE-18786
!image-2022-08-31-11-39-58-549.png!
Basic Timeline of my issue:
{code:java}
2022-08-27 05:26:35 Snapshot TestSnapshot is taken successfully
2022-08-27 15:21:51 The target hfile
fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16 is
generated by a compaction in regionserver-67
2022-08-27 17:26:36 041e9aeb8cdb46f991459c92f8581e16 is compacted to
fd53b8e6b4874eb38712ad2d04389fff successfully
2022-08-27 17:34:53 A full GC started to happen on regionserver-67
2022-08-27 17:35:50 Region fafb8f91bd20b1adfe15e2a64a39557e is re-opened in
regionserver-11, which is scheduled by HMaster
2022-08-27 17:35:56 regionserver-67 wake up from Full GC
2022-08-27 17:35:57 File fafb8f91bd20b1adfe15e2a64a39557e is archived by
lashadoop-regionserver-67 and afterwards, regionserver-67 found that it is
kicked out and exit.
2022-08-27 18:00:00 The archived hfile is removed by HMaster's CleanerChore
2022-08-27 19:48:10 User's job shows error that the file is missed
2022-08-27 20:26:04 Re-taking snapshot TaggingSegmentationSnapshot failed
for 041e9aeb8cdb46f991459c92f8581e16 is missing{code}
The exception of Scanning after region is transmitted:
{code:java}
java.io.FileNotFoundException: File does not
exist:/hbase/prod/hbase-prod/data/default/mdm/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:85)
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
at
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:735)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:415)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
at
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:861)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:848)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:837)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1005)
at
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:317)
at
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:313)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:325)
at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:163)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:898)
at
org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:125)
at
org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:102)
at
org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:269)
at
org.apache.hadoop.hbase.regionserver.HStoreFile.createStreamReader(HStoreFile.java:491)
at
org.apache.hadoop.hbase.regionserver.HStoreFile.getStreamScanner(HStoreFile.java:516)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.getScannersForStoreFiles(StoreFileScanner.java:149)
at
org.apache.hadoop.hbase.regionserver.HStore.getScanners(HStore.java:1309)
at
org.apache.hadoop.hbase.regionserver.HStore.recreateScanners(HStore.java:2042)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.trySwitchToStreamRead(StoreScanner.java:1064)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.shipped(StoreScanner.java:1198)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.shipped(KeyValueHeap.java:437)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.shipped(HRegion.java:6959)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run(RSRpcServices.java:388)
at
org.apache.hadoop.hbase.ipc.ServerCall.setResponse(ServerCall.java:289)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:161)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
{code}
The exception of taking snapshot after region is transmitted:
{code:java}
2022-08-27 20:26:03,794 ERROR org.apache.hadoop.hbase.procedure.Subprocedure:
Subprocedure 'TaggingSegmentationSnapshot' aborting due to a ForeignException!
java.io.FileNotFoundException via
regionserver-11.**,60020,1653373878295:java.io.FileNotFoundException: File does
not exist:
hdfs://test-hbase/hbase/prod/hbase-prod/data/default/mdm/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16
at
org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:349)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:173)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:193)
at
org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:189)
at
org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: File does not exist:
hdfs://beaconstore/hbase/prod/hbase-prod/data/ap/mdm_user_segments/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16
at
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1500)
at
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1493)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1508)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
at
org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:368)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifestV2$ManifestBuilder.storeFile(SnapshotManifestV2.java:129)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifestV2$ManifestBuilder.storeFile(SnapshotManifestV2.java:68)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifest.addRegion(SnapshotManifest.java:249)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifest.addRegion(SnapshotManifest.java:218)
at
org.apache.hadoop.hbase.regionserver.HRegion.addRegionToSnapshot(HRegion.java:4285)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:134)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:77)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
... 4 more
{code}
cc [~mbertozzi] [~apurtell]
was:
We have the exactly the same issue with
https://issues.apache.org/jira/browse/HBASE-13651:
* The SCAN will got FNFE after RS got Full GC and transmitted and opened in
another RS.
* During which ,taking snapshot will also report FNFE
* Issue could be resolved by move the problem region manually.
We find that the HBASE-13651 is reverted afterwards by
https://issues.apache.org/jira/browse/HBASE-18786 since they thought it is not
a problem anymore with the comment in HBASE-18786
!image-2022-08-31-11-39-58-549.png!
Basic Timeline of my issue:
{code:java}
2022-08-27 05:26:35 Snapshot TestingSnapshot is taken successfully
2022-08-27 15:21:51 The target hfile
fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16 is
generated by a compaction in regionserver-67
2022-08-27 17:26:36 041e9aeb8cdb46f991459c92f8581e16 is compacted to
fd53b8e6b4874eb38712ad2d04389fff
2022-08-27 17:35:56 A Full GC happened and the regionserver-67 is
forcefully shutdown
2022-08-27 17:35:50 Region fafb8f91bd20b1adfe15e2a64a39557e is re-opened in
regionserver-11
2022-08-27 17:35:57 File fafb8f91bd20b1adfe15e2a64a39557e is archived
2022-08-27 18:00:00 The hfile is removed by HMaster'S CleanerChore
2022-08-27 19:48:10 User's Spark job on HBase shows error that the file is
missing
2022-08-27 20:26:04 Re-taking snapshot TestingSnapshot also failed for
041e9aeb8cdb46f991459c92f8581e16 is missing{code}
The exception of Scanning after region is transmitted:
{code:java}
java.io.FileNotFoundException: File does not
exist:/hbase/prod/hbase-prod/data/default/mdm/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:85)
at
org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
at
org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:735)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:415)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
at
org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:861)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:848)
at org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:837)
at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1005)
at
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:317)
at
org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:313)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:325)
at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:163)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:898)
at
org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:125)
at
org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:102)
at
org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:269)
at
org.apache.hadoop.hbase.regionserver.HStoreFile.createStreamReader(HStoreFile.java:491)
at
org.apache.hadoop.hbase.regionserver.HStoreFile.getStreamScanner(HStoreFile.java:516)
at
org.apache.hadoop.hbase.regionserver.StoreFileScanner.getScannersForStoreFiles(StoreFileScanner.java:149)
at
org.apache.hadoop.hbase.regionserver.HStore.getScanners(HStore.java:1309)
at
org.apache.hadoop.hbase.regionserver.HStore.recreateScanners(HStore.java:2042)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.trySwitchToStreamRead(StoreScanner.java:1064)
at
org.apache.hadoop.hbase.regionserver.StoreScanner.shipped(StoreScanner.java:1198)
at
org.apache.hadoop.hbase.regionserver.KeyValueHeap.shipped(KeyValueHeap.java:437)
at
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.shipped(HRegion.java:6959)
at
org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run(RSRpcServices.java:388)
at
org.apache.hadoop.hbase.ipc.ServerCall.setResponse(ServerCall.java:289)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:161)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
at
org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
{code}
The exception of taking snapshot after region is transmitted:
{code:java}
2022-08-27 20:26:03,794 ERROR org.apache.hadoop.hbase.procedure.Subprocedure:
Subprocedure 'TaggingSegmentationSnapshot' aborting due to a ForeignException!
java.io.FileNotFoundException via
regionserver-11.**,60020,1653373878295:java.io.FileNotFoundException: File does
not exist:
hdfs://test-hbase/hbase/prod/hbase-prod/data/default/mdm/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16
at
org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:349)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:173)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:193)
at
org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:189)
at
org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.FileNotFoundException: File does not exist:
hdfs://beaconstore/hbase/prod/hbase-prod/data/ap/mdm_user_segments/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16
at
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1500)
at
org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1493)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1508)
at
org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
at
org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:368)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifestV2$ManifestBuilder.storeFile(SnapshotManifestV2.java:129)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifestV2$ManifestBuilder.storeFile(SnapshotManifestV2.java:68)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifest.addRegion(SnapshotManifest.java:249)
at
org.apache.hadoop.hbase.snapshot.SnapshotManifest.addRegion(SnapshotManifest.java:218)
at
org.apache.hadoop.hbase.regionserver.HRegion.addRegionToSnapshot(HRegion.java:4285)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:134)
at
org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:77)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
... 4 more
{code}
cc [~mbertozzi] [~apurtell]
> HBase FileNotFound Exception After Region Transitioned
> -------------------------------------------------------
>
> Key: HBASE-27349
> URL: https://issues.apache.org/jira/browse/HBASE-27349
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.1.0
> Reporter: wuchang
> Assignee: Duo Zhang
> Priority: Critical
> Attachments: image-2022-08-31-11-39-58-549.png
>
>
> We have the exactly the same issue with
> https://issues.apache.org/jira/browse/HBASE-13651:
> * The SCAN will got FNFE after RS got Full GC and transmitted and opened in
> another RS.
> * During which ,taking snapshot will also report FNFE
> * Issue could be resolved by move the problem region manually.
> We find that the HBASE-13651 is reverted afterwards by
> https://issues.apache.org/jira/browse/HBASE-18786 since they thought it is
> not a problem anymore with the comment in HBASE-18786
> !image-2022-08-31-11-39-58-549.png!
> Basic Timeline of my issue:
> {code:java}
> 2022-08-27 05:26:35 Snapshot TestSnapshot is taken successfully
> 2022-08-27 15:21:51 The target hfile
> fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16 is
> generated by a compaction in regionserver-67
> 2022-08-27 17:26:36 041e9aeb8cdb46f991459c92f8581e16 is compacted to
> fd53b8e6b4874eb38712ad2d04389fff successfully
> 2022-08-27 17:34:53 A full GC started to happen on regionserver-67
> 2022-08-27 17:35:50 Region fafb8f91bd20b1adfe15e2a64a39557e is re-opened
> in regionserver-11, which is scheduled by HMaster
> 2022-08-27 17:35:56 regionserver-67 wake up from Full GC
> 2022-08-27 17:35:57 File fafb8f91bd20b1adfe15e2a64a39557e is archived by
> lashadoop-regionserver-67 and afterwards, regionserver-67 found that it is
> kicked out and exit.
> 2022-08-27 18:00:00 The archived hfile is removed by HMaster's
> CleanerChore
> 2022-08-27 19:48:10 User's job shows error that the file is missed
> 2022-08-27 20:26:04 Re-taking snapshot TaggingSegmentationSnapshot failed
> for 041e9aeb8cdb46f991459c92f8581e16 is missing{code}
> The exception of Scanning after region is transmitted:
>
> {code:java}
> java.io.FileNotFoundException: File does not
> exist:/hbase/prod/hbase-prod/data/default/mdm/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:85)
> at
> org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:75)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirStatAndListingOp.getBlockLocations(FSDirStatAndListingOp.java:152)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocations(FSNamesystem.java:1909)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getBlockLocations(NameNodeRpcServer.java:735)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getBlockLocations(ClientNamenodeProtocolServerSideTranslatorPB.java:415)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native
> Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
> at
> org.apache.hadoop.hdfs.DFSClient.callGetBlockLocations(DFSClient.java:861)
> at
> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:848)
> at
> org.apache.hadoop.hdfs.DFSClient.getLocatedBlocks(DFSClient.java:837)
> at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1005)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:317)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:313)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:325)
> at
> org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:163)
> at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:898)
> at
> org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:125)
> at
> org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.<init>(FSDataInputStreamWrapper.java:102)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.open(StoreFileInfo.java:269)
> at
> org.apache.hadoop.hbase.regionserver.HStoreFile.createStreamReader(HStoreFile.java:491)
> at
> org.apache.hadoop.hbase.regionserver.HStoreFile.getStreamScanner(HStoreFile.java:516)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.getScannersForStoreFiles(StoreFileScanner.java:149)
> at
> org.apache.hadoop.hbase.regionserver.HStore.getScanners(HStore.java:1309)
> at
> org.apache.hadoop.hbase.regionserver.HStore.recreateScanners(HStore.java:2042)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.trySwitchToStreamRead(StoreScanner.java:1064)
> at
> org.apache.hadoop.hbase.regionserver.StoreScanner.shipped(StoreScanner.java:1198)
> at
> org.apache.hadoop.hbase.regionserver.KeyValueHeap.shipped(KeyValueHeap.java:437)
> at
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.shipped(HRegion.java:6959)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices$RegionScannerShippedCallBack.run(RSRpcServices.java:388)
> at
> org.apache.hadoop.hbase.ipc.ServerCall.setResponse(ServerCall.java:289)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:161)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324)
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
> {code}
>
> The exception of taking snapshot after region is transmitted:
> {code:java}
> 2022-08-27 20:26:03,794 ERROR org.apache.hadoop.hbase.procedure.Subprocedure:
> Subprocedure 'TaggingSegmentationSnapshot' aborting due to a ForeignException!
> java.io.FileNotFoundException via
> regionserver-11.**,60020,1653373878295:java.io.FileNotFoundException: File
> does not exist:
> hdfs://test-hbase/hbase/prod/hbase-prod/data/default/mdm/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16
> at
> org.apache.hadoop.hbase.regionserver.snapshot.RegionServerSnapshotManager$SnapshotSubprocedurePool.waitForOutstandingTasks(RegionServerSnapshotManager.java:349)
> at
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.flushSnapshot(FlushSnapshotSubprocedure.java:173)
> at
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure.insideBarrier(FlushSnapshotSubprocedure.java:193)
> at
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:189)
> at
> org.apache.hadoop.hbase.procedure.Subprocedure.call(Subprocedure.java:53)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.FileNotFoundException: File does not exist:
> hdfs://beaconstore/hbase/prod/hbase-prod/data/ap/mdm_user_segments/fafb8f91bd20b1adfe15e2a64a39557e/i/041e9aeb8cdb46f991459c92f8581e16
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1500)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1493)
> at
> org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
> at
> org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1508)
> at
> org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
> at
> org.apache.hadoop.hbase.regionserver.StoreFileInfo.getReferencedFileStatus(StoreFileInfo.java:368)
> at
> org.apache.hadoop.hbase.snapshot.SnapshotManifestV2$ManifestBuilder.storeFile(SnapshotManifestV2.java:129)
> at
> org.apache.hadoop.hbase.snapshot.SnapshotManifestV2$ManifestBuilder.storeFile(SnapshotManifestV2.java:68)
> at
> org.apache.hadoop.hbase.snapshot.SnapshotManifest.addRegion(SnapshotManifest.java:249)
> at
> org.apache.hadoop.hbase.snapshot.SnapshotManifest.addRegion(SnapshotManifest.java:218)
> at
> org.apache.hadoop.hbase.regionserver.HRegion.addRegionToSnapshot(HRegion.java:4285)
> at
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:134)
> at
> org.apache.hadoop.hbase.regionserver.snapshot.FlushSnapshotSubprocedure$RegionSnapshotTask.call(FlushSnapshotSubprocedure.java:77)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ... 4 more
> {code}
>
> cc [~mbertozzi] [~apurtell]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)