[
https://issues.apache.org/jira/browse/HBASE-28951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17895283#comment-17895283
]
Umesh Kumar Kumawat edited comment on HBASE-28951 at 11/4/24 1:07 PM:
----------------------------------------------------------------------
>>In general the problem is the 'dead' RS is not really dead and can still do
>>something.
Yeah I agree with the above statement but In this particular case, the RS was
not dead yet, it was still aborting itself. RS didn't get any response from the
HDFS and aborted itself. AS RS aborted itself, HMaster assigned this procedure
to a different worker.
Some more info that needs to be noted is - the aborting RS starts splitting a
little late. I am adding one such case below. rs-219 is aborting one and rs-216
is the second worker. The aborting rs started 50 the second late from the
second worker although aborting rs was the one who got the RPC request first.
{code:java}
2024-10-01 23:02:15,499 INFO
[G_REPLAY_OPS-regionserver/regionserver-216:XXXXX-3] wal.WALSplitter -
Splitting
hdfs://hbase1a/hbase/WALs/regionserver-150.regionserver.hbase.<<cluster>>,XXXXX,1727347097348-splitting/regionserver-150.regionserver.hbase.<<cluster>>%2CXXXXX%2C1727347097348.1727823118024,
size=92 (92bytes){code}
{code:java}
2024-10-01 23:03:05,793 INFO
[G_REPLAY_OPS-regionserver/regionserver-219:XXXXX-1] wal.WALSplitter -
Splitting
hdfs://hbase1a/hbase/WALs/regionserver-150.regionserver.hbase.<<cluster>>,XXXXX,1727347097348-splitting/regionserver-150.regionserver.hbase.<<cluster>>%2CXXXXX%2C1727347097348.1727823118024,
size=93.4 M (97950842bytes){code}
{code:java}
2024-10-01 23:03:15,405 INFO
[G_REPLAY_OPS-regionserver/regionserver-216:XXXXX-3] wal.WALSplitter -
Splitting
hdfs://hbase1a/hbase/WALs/regionserver-150.regionserver.hbase.<<cluster>>,XXXXX,1727347097348-splitting/regionserver-150.regionserver.hbase.<<cluster>>%2CXXXXX%2C1727347097348.1727823118024,
size=93.4 M (97950842bytes){code}
was (Author: JIRAUSER298625):
>>In general the problem is the 'dead' RS is not really dead and can still do
>>something.
Yeah I agree with the above statement but In this particular case, the RS was
not dead yet, it was still aborting itself. RS didn't get any response from the
HDFS and aborted itself. AS RS aborted itself, HMaster assigned this procedure
to a different worker.
Some more info that needs to be noted is - the aborting RS starts splitting a
little late. I am adding one such case below. rs-219 is aborting one and rs-216
is the second worker. The aborting rs started 50 the second late from the
second worker although aborting rs was the one who got the RPC request first.
{code:java}
2024-10-01 23:02:15,499 INFO
[G_REPLAY_OPS-regionserver/regionserver-216:XXXXX-3] wal.WALSplitter -
Splitting
hdfs://hbase1a/hbase/WALs/regionserver-150.regionserver.hbase.<<cluster>>,XXXXX,1727347097348-splitting/regionserver-150.regionserver.hbase.<<cluster>>%2CXXXXX%2C1727347097348.1727823118024,
size=92 (92bytes){code}
{{}}
{code:java}
2024-10-01 23:03:05,793 INFO
[G_REPLAY_OPS-regionserver/regionserver-219:XXXXX-1] wal.WALSplitter -
Splitting
hdfs://hbase1a/hbase/WALs/regionserver-150.regionserver.hbase.<<cluster>>,XXXXX,1727347097348-splitting/regionserver-150.regionserver.hbase.<<cluster>>%2CXXXXX%2C1727347097348.1727823118024,
size=93.4 M (97950842bytes){code}
{{}}
{code:java}
2024-10-01 23:03:15,405 INFO
[G_REPLAY_OPS-regionserver/regionserver-216:XXXXX-3] wal.WALSplitter -
Splitting
hdfs://hbase1a/hbase/WALs/regionserver-150.regionserver.hbase.<<cluster>>,XXXXX,1727347097348-splitting/regionserver-150.regionserver.hbase.<<cluster>>%2CXXXXX%2C1727347097348.1727823118024,
size=93.4 M (97950842bytes){code}
{{}}
> WAL Split Delays Due to Concurrent WAL Splitting During worker RS Abort
> -----------------------------------------------------------------------
>
> Key: HBASE-28951
> URL: https://issues.apache.org/jira/browse/HBASE-28951
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.5.8
> Reporter: Umesh Kumar Kumawat
> Priority: Major
>
> When a worker RS gets aborted after the SplitWALRemoteProcedure got
> dispatched, RegionServerTracker takes care of it and [aborts the pending
> Operation|https://github.com/apache/hbase/blob/rel/2.5.8/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java#L160]
> on the aborting region as part of
> [expireServer|https://github.com/apache/hbase/blob/rel/2.5.8/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionServerTracker.java#L172].
>
> It did help the parent procedure, SplitWalProcedure, to choose another worker
> RS but the aborting RS is also splitting the WAL. Now while creating the
> recovered edits both will try to write the same file. One RS that starts late
> for the file deletes the previous file that cause failures.
> h4. Logs -
> region server tracker marking the remove procedure failed
> {code:java}
> 2024-10-01 23:02:32,274 WARN [RegionServerTracker-0]
> procedure.SplitWALRemoteProcedure - Sent
> hdfs://hbase1a/hbase/WALs/regionserver-33.regionserver.hbase.<cluster>,XXXXX,1727362162836-splitting/regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172
> to wrong server
> regionserver-283.regionserver.hbase.<cluster>,XXXXX,1727420096936, try another
> org.apache.hadoop.hbase.DoNotRetryIOException: server not online
> regionserver-283.regionserver.hbase.<cluster>,XXXXX,1727420096936
> at
> org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher.abortPendingOperations(RSProcedureDispatcher.java:163)
> at
> org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher.abortPendingOperations(RSProcedureDispatcher.java:61)
> at
> org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher$BufferNode.abortOperationsInQueue(RemoteProcedureDispatcher.java:417)
> at
> org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.removeNode(RemoteProcedureDispatcher.java:201)
> at
> org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher.serverRemoved(RSProcedureDispatcher.java:176)
> at
> org.apache.hadoop.hbase.master.ServerManager.lambda$expireServer$2(ServerManager.java:576)
> at
> java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
> at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
> at
> org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:576)
> at
> org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:530)
> at
> org.apache.hadoop.hbase.master.RegionServerTracker.processAsActiveMaster(RegionServerTracker.java:172)
> at
> org.apache.hadoop.hbase.master.RegionServerTracker.refresh(RegionServerTracker.java:206)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750){code}
> {code:java}
> 2024-10-01 23:02:32,340 INFO [PEWorker-21] procedure2.ProcedureExecutor -
> Finished pid=122448609, ppid=122448595, state=SUCCESS;
> SplitWALRemoteProcedure
> regionserver-33.regionserver.hbase.<cluster>,XXXXX%2C1727362162836.1727822221172,
> worker=regionserver-283.regionserver.hbase.<cluster>,XXXXX,1727420096936 in
> 54.0500 sec{code}
> Parent SplitWalProcedure will create another RemoteProcedure for this
> {code:java}
> 2024-10-01 23:02:32,726 WARN [PEWorker-17] procedure.SplitWALProcedure -
> Failed to split wal
> hdfs://hbase1a/hbase/WALs/regionserver-33.regionserver.hbase.<cluster>,XXXXX,1727362162836-splitting/regionserver-33.regionserver.hbase.<cluster>,XXXXX%2C1727362162836.1727822221172
> by server regionserver-283.regionserver.hbase.<cluster>,XXXXX,1727420096936,
> retry...{code}
> {code:java}
> 2024-10-01 23:02:39,414 INFO [PEWorker-28] procedure2.ProcedureExecutor -
> Initialized subprocedures=[{pid=122452821, ppid=122448595, state=RUNNABLE;
> SplitWALRemoteProcedure
> regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172,
>
> worker=regionserver-323.regionserver.hbase.<cluster>,XXXXX,1727308912906}]{code}
> Splitting still in progress on dying rs
> {code:java}
> 2024-10-01 23:02:45,652 INFO
> [G_REPLAY_OPS-regionserver/regionserver-283:XXXXX-0] wal.WALSplitter -
> Splitting
> hdfs://hbase1a/hbase/WALs/regionserver-33.regionserver.hbase.<cluster>,XXXXX,1727362162836-splitting/regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172,
> size=128.1 M (134313407bytes){code}
> rs-323 creating recovered edits
> {code:java}
> 2024-10-01 23:02:42,876 INFO
> [OPS-regionserver/regionserver-323:XXXXX-5-Writer-2]
> monitor.StreamSlowMonitor - New stream slow monitor
> 0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp{code}
> {code:java}
> 2024-10-01 23:02:43,171 INFO
> [OPS-regionserver/regionserver-323:XXXXX-5-Writer-2]
> wal.RecoveredEditsOutputSink - Creating recovered edits writer
> path=hdfs://hbase1a/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp{code}
> rs-283 deletes the above files and again creates the file
> {code:java}
> 2024-10-01 23:02:50,520 WARN
> [OPS-regionserver/regionserver-283:XXXXX-0-Writer-2]
> wal.RecoveredEditsOutputSink - Found old edits file. It could be the result
> of a previous failed split attempt. Deleting
> hdfs://hbase1a/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp,
> length=0{code}
> {code:java}
> 2024-10-01 23:02:50,794 INFO
> [OPS-regionserver/regionserver-283:XXXXX-0-Writer-2]
> monitor.StreamSlowMonitor - New stream slow monitor
> 0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp{code}
> {code:java}
> 2024-10-01 23:02:51,135 INFO
> [OPS-regionserver/regionserver-283:XXXXX-0-Writer-2]
> wal.RecoveredEditsOutputSink - Creating recovered edits writer
> path=hdfs://hbase1a/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp{code}
> Now rs 323 will start failing
> {code:java}
> 2024-10-01 23:03:02,137 WARN [Thread-1081409] hdfs.DataStreamer -
> DataStreamer Exception
> java.io.FileNotFoundException: File does not exist:
> /hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.hbase1a.hbase.core2.aws-prod5-uswest2.aws.sfdc.is%2C60020%2C1727362162836.1727822221172.temp
> (inode 1440741238) [Lease. Holder: DFSClient_NONMAPREDUCE_-2039838105_1,
> pending creates: 21]
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3103)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:610)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2977)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:618)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1105)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3060)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1091)
> at
> org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1939)
> at
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForCreate(DataStreamer.java:1734)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717)
> Caused by:
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File
> does not exist:
> /hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.hbase1a.hbase.core2.aws-prod5-uswest2.aws.sfdc.is%2C60020%2C1727362162836.1727822221172.temp
> (inode 1440741238) [Lease. Holder: DFSClient_NONMAPREDUCE_-2039838105_1,
> pending creates: 21]
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3103)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:610)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2977)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:618)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1105)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3060)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567)
> at org.apache.hadoop.ipc.Client.call(Client.java:1513)
> at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
> at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.lambda$addBlock$11(ClientNamenodeProtocolTranslatorPB.java:495)
> at
> org.apache.hadoop.ipc.internal.ShadedProtobufHelper.ipc(ShadedProtobufHelper.java:160)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:495)
> at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
> at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
> at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
> at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1088)
> ... 3 more
> {code}
> {code:java}
> 2024-10-01 23:03:02,143 ERROR [split-log-closeStream-pool-1]
> wal.RecoveredEditsOutputSink - Could not close recovered edits at
> hdfs://hbase1a/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp
> java.io.FileNotFoundException: File does not exist:
> /hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp
> (inode 1440741238) [Lease. Holder: DFSClient_NONMAPREDUCE_-2039838105_1,
> pending creates: 21]
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3103)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:610)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2977)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:618)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1105)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3060) at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1091)
> at
> org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1939)
> at
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForCreate(DataStreamer.java:1734)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717)
> Caused by:
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File
> does not exist:
> /hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp
> (inode 1440741238) [Lease. Holder: DFSClient_NONMAPREDUCE_-2039838105_1,
> pending creates: 21]
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3103)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:610)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2977)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:618)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1105)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3060) at
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567)
> at org.apache.hadoop.ipc.Client.call(Client.java:1513)
> at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
> at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.lambda$addBlock$11(ClientNamenodeProtocolTranslatorPB.java:495)
> at
> org.apache.hadoop.ipc.internal.ShadedProtobufHelper.ipc(ShadedProtobufHelper.java:160)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:495)
> at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
> at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
> at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
> at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1088)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)