[
https://issues.apache.org/jira/browse/HBASE-28951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Umesh Kumar Kumawat updated HBASE-28951:
----------------------------------------
Description:
When a worker RS gets aborted after the SplitWALRemoteProcedure got dispatched,
RegionServerTracker takes care of it and [aborts the pending
Operation|https://github.com/apache/hbase/blob/rel/2.5.8/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java#L160]
on the aborting region as part of
[expireServer|https://github.com/apache/hbase/blob/rel/2.5.8/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionServerTracker.java#L172].
It did help the parent procedure, SplitWalProcedure, to choose another worker
RS but the aborting RS is also splitting the WAL. Now while creating the
recovered edits both will try to write the same file. One RS that starts late
for the file deletes the previous file that cause failures.
h4. Logs -
region server tracker marking the remove procedure failed
{code:java}
2024-10-01 23:02:32,274 WARN [RegionServerTracker-0]
procedure.SplitWALRemoteProcedure - Sent
hdfs://hbase1a/hbase/WALs/regionserver-33.regionserver.hbase.<cluster>,XXXXX,1727362162836-splitting/regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172
to wrong server
regionserver-283.regionserver.hbase.<cluster>,XXXXX,1727420096936, try another
org.apache.hadoop.hbase.DoNotRetryIOException: server not online
regionserver-283.regionserver.hbase.<cluster>,XXXXX,1727420096936
at
org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher.abortPendingOperations(RSProcedureDispatcher.java:163)
at
org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher.abortPendingOperations(RSProcedureDispatcher.java:61)
at
org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher$BufferNode.abortOperationsInQueue(RemoteProcedureDispatcher.java:417)
at
org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.removeNode(RemoteProcedureDispatcher.java:201)
at
org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher.serverRemoved(RSProcedureDispatcher.java:176)
at
org.apache.hadoop.hbase.master.ServerManager.lambda$expireServer$2(ServerManager.java:576)
at
java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
at
org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:576)
at
org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:530)
at
org.apache.hadoop.hbase.master.RegionServerTracker.processAsActiveMaster(RegionServerTracker.java:172)
at
org.apache.hadoop.hbase.master.RegionServerTracker.refresh(RegionServerTracker.java:206)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750){code}
{code:java}
2024-10-01 23:02:32,340 INFO [PEWorker-21] procedure2.ProcedureExecutor -
Finished pid=122448609, ppid=122448595, state=SUCCESS; SplitWALRemoteProcedure
regionserver-33.regionserver.hbase.<cluster>,XXXXX%2C1727362162836.1727822221172,
worker=regionserver-283.regionserver.hbase.<cluster>,XXXXX,1727420096936 in
54.0500 sec{code}
Parent SplitWalProcedure will create another RemoteProcedure for this
{code:java}
2024-10-01 23:02:32,726 WARN [PEWorker-17] procedure.SplitWALProcedure - Failed
to split wal
hdfs://hbase1a/hbase/WALs/regionserver-33.regionserver.hbase.<cluster>,XXXXX,1727362162836-splitting/regionserver-33.regionserver.hbase.<cluster>,XXXXX%2C1727362162836.1727822221172
by server regionserver-283.regionserver.hbase.<cluster>,XXXXX,1727420096936,
retry...{code}
{code:java}
2024-10-01 23:02:39,414 INFO [PEWorker-28] procedure2.ProcedureExecutor -
Initialized subprocedures=[{pid=122452821, ppid=122448595, state=RUNNABLE;
SplitWALRemoteProcedure
regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172,
worker=regionserver-323.regionserver.hbase.<cluster>,XXXXX,1727308912906}]{code}
Splitting still in progress on dying rs
{code:java}
2024-10-01 23:02:45,652 INFO
[G_REPLAY_OPS-regionserver/regionserver-283:XXXXX-0] wal.WALSplitter -
Splitting
hdfs://hbase1a/hbase/WALs/regionserver-33.regionserver.hbase.<cluster>,XXXXX,1727362162836-splitting/regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172,
size=128.1 M (134313407bytes){code}
rs-323 creating recovered edits
{code:java}
2024-10-01 23:02:42,876 INFO
[OPS-regionserver/regionserver-323:XXXXX-5-Writer-2] monitor.StreamSlowMonitor
- New stream slow monitor
0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp{code}
{code:java}
2024-10-01 23:02:43,171 INFO
[OPS-regionserver/regionserver-323:XXXXX-5-Writer-2]
wal.RecoveredEditsOutputSink - Creating recovered edits writer
path=hdfs://hbase1a/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp{code}
rs-283 deletes the above files and again creates the file
{code:java}
2024-10-01 23:02:50,520 WARN
[OPS-regionserver/regionserver-283:XXXXX-0-Writer-2]
wal.RecoveredEditsOutputSink - Found old edits file. It could be the result of
a previous failed split attempt. Deleting
hdfs://hbase1a/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp,
length=0{code}
{code:java}
2024-10-01 23:02:50,794 INFO
[OPS-regionserver/regionserver-283:XXXXX-0-Writer-2] monitor.StreamSlowMonitor
- New stream slow monitor
0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp{code}
{code:java}
2024-10-01 23:02:51,135 INFO
[OPS-regionserver/regionserver-283:XXXXX-0-Writer-2]
wal.RecoveredEditsOutputSink - Creating recovered edits writer
path=hdfs://hbase1a/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp{code}
Now rs 323 will start failing
{code:java}
2024-10-01 23:03:02,137 WARN [Thread-1081409] hdfs.DataStreamer - DataStreamer
Exception
java.io.FileNotFoundException: File does not exist:
/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.hbase1a.hbase.core2.aws-prod5-uswest2.aws.sfdc.is%2C60020%2C1727362162836.1727822221172.temp
(inode 1440741238) [Lease. Holder: DFSClient_NONMAPREDUCE_-2039838105_1,
pending creates: 21]
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3103)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:610)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2977)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:618)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1105)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1028)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3060)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
at
org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1091)
at
org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1939)
at
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForCreate(DataStreamer.java:1734)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717)
Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does
not exist:
/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.hbase1a.hbase.core2.aws-prod5-uswest2.aws.sfdc.is%2C60020%2C1727362162836.1727822221172.temp
(inode 1440741238) [Lease. Holder: DFSClient_NONMAPREDUCE_-2039838105_1,
pending creates: 21]
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3103)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:610)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2977)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:618)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1105)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1028)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3060)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567)
at org.apache.hadoop.ipc.Client.call(Client.java:1513)
at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.lambda$addBlock$11(ClientNamenodeProtocolTranslatorPB.java:495)
at
org.apache.hadoop.ipc.internal.ShadedProtobufHelper.ipc(ShadedProtobufHelper.java:160)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:495)
at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1088)
... 3 more
{code}
{code:java}
2024-10-01 23:03:02,143 ERROR [split-log-closeStream-pool-1]
wal.RecoveredEditsOutputSink - Could not close recovered edits at
hdfs://hbase1a/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp
java.io.FileNotFoundException: File does not exist:
/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp
(inode 1440741238) [Lease. Holder: DFSClient_NONMAPREDUCE_-2039838105_1,
pending creates: 21]
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3103)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:610)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2977)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:618)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1105)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1028)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3060) at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
at
org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1091)
at
org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1939)
at
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForCreate(DataStreamer.java:1734)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717)
Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does
not exist:
/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp
(inode 1440741238) [Lease. Holder: DFSClient_NONMAPREDUCE_-2039838105_1,
pending creates: 21]
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3103)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:610)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2977)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:618)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1105)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1028)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3060) at
org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567)
at org.apache.hadoop.ipc.Client.call(Client.java:1513)
at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.lambda$addBlock$11(ClientNamenodeProtocolTranslatorPB.java:495)
at
org.apache.hadoop.ipc.internal.ShadedProtobufHelper.ipc(ShadedProtobufHelper.java:160)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:495)
at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1088)
{code}
was:
When a worker RS gets aborted after the SplitWALRemoteProcedure got dispatched,
RegionServerTracker takes care of it and [aborts the pending
Operation|https://github.com/apache/hbase/blob/rel/2.5.8/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java#L160]
on the aborting region as part of
[expireServer|https://github.com/apache/hbase/blob/rel/2.5.8/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionServerTracker.java#L172].
It did help the parent procedure, SplitWalProcedure, to choose another worker
RS but the aborting RS is also splitting the WAL. Now while creating the
recovered edits both will try to write the same file. One RS that starts late
for the file deletes the previous file that cause failures.
Logs -
region server tracker marking the remove procedure failed
{code:java}
2024-10-01 23:02:32,274 WARN [RegionServerTracker-0]
procedure.SplitWALRemoteProcedure - Sent
hdfs://hbase1a/hbase/WALs/regionserver-33.regionserver.hbase.<cluster>,XXXXX,1727362162836-splitting/regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172
to wrong server
regionserver-283.regionserver.hbase.<cluster>,XXXXX,1727420096936, try another
org.apache.hadoop.hbase.DoNotRetryIOException: server not online
regionserver-283.regionserver.hbase.<cluster>,XXXXX,1727420096936
at
org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher.abortPendingOperations(RSProcedureDispatcher.java:163)
at
org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher.abortPendingOperations(RSProcedureDispatcher.java:61)
at
org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher$BufferNode.abortOperationsInQueue(RemoteProcedureDispatcher.java:417)
at
org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.removeNode(RemoteProcedureDispatcher.java:201)
at
org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher.serverRemoved(RSProcedureDispatcher.java:176)
at
org.apache.hadoop.hbase.master.ServerManager.lambda$expireServer$2(ServerManager.java:576)
at
java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
at
org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:576)
at
org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:530)
at
org.apache.hadoop.hbase.master.RegionServerTracker.processAsActiveMaster(RegionServerTracker.java:172)
at
org.apache.hadoop.hbase.master.RegionServerTracker.refresh(RegionServerTracker.java:206)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750){code}
{{}}
{code:java}
2024-10-01 23:02:32,340 INFO [PEWorker-21] procedure2.ProcedureExecutor -
Finished pid=122448609, ppid=122448595, state=SUCCESS; SplitWALRemoteProcedure
regionserver-33.regionserver.hbase.<cluster>,XXXXX%2C1727362162836.1727822221172,
worker=regionserver-283.regionserver.hbase.<cluster>,XXXXX,1727420096936 in
54.0500 sec{code}
Parent SplitWalProcedure will create another RemoteProcedure for this
{{}}
{code:java}
2024-10-01 23:02:32,726 WARN [PEWorker-17] procedure.SplitWALProcedure - Failed
to split wal
hdfs://hbase1a/hbase/WALs/regionserver-33.regionserver.hbase.<cluster>,XXXXX,1727362162836-splitting/regionserver-33.regionserver.hbase.<cluster>,XXXXX%2C1727362162836.1727822221172
by server regionserver-283.regionserver.hbase.<cluster>,XXXXX,1727420096936,
retry...{code}
{code:java}
2024-10-01 23:02:39,414 INFO [PEWorker-28] procedure2.ProcedureExecutor -
Initialized subprocedures=[{pid=122452821, ppid=122448595, state=RUNNABLE;
SplitWALRemoteProcedure
regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172,
worker=regionserver-323.regionserver.hbase.<cluster>,XXXXX,1727308912906}]{code}
Splitting still in progress on dying rs
{code:java}
2024-10-01 23:02:45,652 INFO
[G_REPLAY_OPS-regionserver/regionserver-283:XXXXX-0] wal.WALSplitter -
Splitting
hdfs://hbase1a/hbase/WALs/regionserver-33.regionserver.hbase.<cluster>,XXXXX,1727362162836-splitting/regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172,
size=128.1 M (134313407bytes){code}
rs-323 creating recovered edits
{code:java}
2024-10-01 23:02:42,876 INFO
[OPS-regionserver/regionserver-323:XXXXX-5-Writer-2] monitor.StreamSlowMonitor
- New stream slow monitor
0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp{code}
{code:java}
2024-10-01 23:02:43,171 INFO
[OPS-regionserver/regionserver-323:XXXXX-5-Writer-2]
wal.RecoveredEditsOutputSink - Creating recovered edits writer
path=hdfs://hbase1a/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp{code}
rs-283 deletes the above files and again creates the file
{code:java}
2024-10-01 23:02:50,520 WARN
[OPS-regionserver/regionserver-283:XXXXX-0-Writer-2]
wal.RecoveredEditsOutputSink - Found old edits file. It could be the result of
a previous failed split attempt. Deleting
hdfs://hbase1a/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp,
length=0{code}
{code:java}
2024-10-01 23:02:50,794 INFO
[OPS-regionserver/regionserver-283:XXXXX-0-Writer-2] monitor.StreamSlowMonitor
- New stream slow monitor
0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp{code}
{code:java}
2024-10-01 23:02:51,135 INFO
[OPS-regionserver/regionserver-283:XXXXX-0-Writer-2]
wal.RecoveredEditsOutputSink - Creating recovered edits writer
path=hdfs://hbase1a/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp{code}
Now rs 323 will start failing
{code:java}
2024-10-01 23:03:02,137 WARN [Thread-1081409] hdfs.DataStreamer - DataStreamer
Exception
java.io.FileNotFoundException: File does not exist:
/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.hbase1a.hbase.core2.aws-prod5-uswest2.aws.sfdc.is%2C60020%2C1727362162836.1727822221172.temp
(inode 1440741238) [Lease. Holder: DFSClient_NONMAPREDUCE_-2039838105_1,
pending creates: 21]
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3103)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:610)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2977)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:618)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1105)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1028)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3060)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
at
org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1091)
at
org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1939)
at
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForCreate(DataStreamer.java:1734)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717)
Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does
not exist:
/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.hbase1a.hbase.core2.aws-prod5-uswest2.aws.sfdc.is%2C60020%2C1727362162836.1727822221172.temp
(inode 1440741238) [Lease. Holder: DFSClient_NONMAPREDUCE_-2039838105_1,
pending creates: 21]
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3103)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:610)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2977)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:618)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1105)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1028)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3060)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567)
at org.apache.hadoop.ipc.Client.call(Client.java:1513)
at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.lambda$addBlock$11(ClientNamenodeProtocolTranslatorPB.java:495)
at
org.apache.hadoop.ipc.internal.ShadedProtobufHelper.ipc(ShadedProtobufHelper.java:160)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:495)
at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1088)
... 3 more
{code}
{code:java}
2024-10-01 23:03:02,143 ERROR [split-log-closeStream-pool-1]
wal.RecoveredEditsOutputSink - Could not close recovered edits at
hdfs://hbase1a/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp
java.io.FileNotFoundException: File does not exist:
/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp
(inode 1440741238) [Lease. Holder: DFSClient_NONMAPREDUCE_-2039838105_1,
pending creates: 21]
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3103)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:610)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2977)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:618)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1105)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1028)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3060) at
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at
org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
at
org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
at
org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1091)
at
org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1939)
at
org.apache.hadoop.hdfs.DataStreamer.setupPipelineForCreate(DataStreamer.java:1734)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717)
Caused by:
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does
not exist:
/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp
(inode 1440741238) [Lease. Holder: DFSClient_NONMAPREDUCE_-2039838105_1,
pending creates: 21]
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3103)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:610)
at
org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2977)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:618)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1105)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1028)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3060) at
org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567)
at org.apache.hadoop.ipc.Client.call(Client.java:1513)
at org.apache.hadoop.ipc.Client.call(Client.java:1410)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
at
org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.lambda$addBlock$11(ClientNamenodeProtocolTranslatorPB.java:495)
at
org.apache.hadoop.ipc.internal.ShadedProtobufHelper.ipc(ShadedProtobufHelper.java:160)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:495)
at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
at
org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at
org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1088)
{code}
{{}}
{{}}
> WAL split delay because two worker RS splitting the same WAL, when one RS is
> aborting
> -------------------------------------------------------------------------------------
>
> Key: HBASE-28951
> URL: https://issues.apache.org/jira/browse/HBASE-28951
> Project: HBase
> Issue Type: Bug
> Affects Versions: 2.5.8
> Reporter: Umesh Kumar Kumawat
> Priority: Major
>
> When a worker RS gets aborted after the SplitWALRemoteProcedure got
> dispatched, RegionServerTracker takes care of it and [aborts the pending
> Operation|https://github.com/apache/hbase/blob/rel/2.5.8/hbase-server/src/main/java/org/apache/hadoop/hbase/master/procedure/RSProcedureDispatcher.java#L160]
> on the aborting region as part of
> [expireServer|https://github.com/apache/hbase/blob/rel/2.5.8/hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionServerTracker.java#L172].
>
> It did help the parent procedure, SplitWalProcedure, to choose another worker
> RS but the aborting RS is also splitting the WAL. Now while creating the
> recovered edits both will try to write the same file. One RS that starts late
> for the file deletes the previous file that cause failures.
> h4. Logs -
> region server tracker marking the remove procedure failed
> {code:java}
> 2024-10-01 23:02:32,274 WARN [RegionServerTracker-0]
> procedure.SplitWALRemoteProcedure - Sent
> hdfs://hbase1a/hbase/WALs/regionserver-33.regionserver.hbase.<cluster>,XXXXX,1727362162836-splitting/regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172
> to wrong server
> regionserver-283.regionserver.hbase.<cluster>,XXXXX,1727420096936, try another
> org.apache.hadoop.hbase.DoNotRetryIOException: server not online
> regionserver-283.regionserver.hbase.<cluster>,XXXXX,1727420096936
> at
> org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher.abortPendingOperations(RSProcedureDispatcher.java:163)
> at
> org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher.abortPendingOperations(RSProcedureDispatcher.java:61)
> at
> org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher$BufferNode.abortOperationsInQueue(RemoteProcedureDispatcher.java:417)
> at
> org.apache.hadoop.hbase.procedure2.RemoteProcedureDispatcher.removeNode(RemoteProcedureDispatcher.java:201)
> at
> org.apache.hadoop.hbase.master.procedure.RSProcedureDispatcher.serverRemoved(RSProcedureDispatcher.java:176)
> at
> org.apache.hadoop.hbase.master.ServerManager.lambda$expireServer$2(ServerManager.java:576)
> at
> java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
> at java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:647)
> at
> org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:576)
> at
> org.apache.hadoop.hbase.master.ServerManager.expireServer(ServerManager.java:530)
> at
> org.apache.hadoop.hbase.master.RegionServerTracker.processAsActiveMaster(RegionServerTracker.java:172)
> at
> org.apache.hadoop.hbase.master.RegionServerTracker.refresh(RegionServerTracker.java:206)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:750){code}
> {code:java}
> 2024-10-01 23:02:32,340 INFO [PEWorker-21] procedure2.ProcedureExecutor -
> Finished pid=122448609, ppid=122448595, state=SUCCESS;
> SplitWALRemoteProcedure
> regionserver-33.regionserver.hbase.<cluster>,XXXXX%2C1727362162836.1727822221172,
> worker=regionserver-283.regionserver.hbase.<cluster>,XXXXX,1727420096936 in
> 54.0500 sec{code}
> Parent SplitWalProcedure will create another RemoteProcedure for this
> {code:java}
> 2024-10-01 23:02:32,726 WARN [PEWorker-17] procedure.SplitWALProcedure -
> Failed to split wal
> hdfs://hbase1a/hbase/WALs/regionserver-33.regionserver.hbase.<cluster>,XXXXX,1727362162836-splitting/regionserver-33.regionserver.hbase.<cluster>,XXXXX%2C1727362162836.1727822221172
> by server regionserver-283.regionserver.hbase.<cluster>,XXXXX,1727420096936,
> retry...{code}
> {code:java}
> 2024-10-01 23:02:39,414 INFO [PEWorker-28] procedure2.ProcedureExecutor -
> Initialized subprocedures=[{pid=122452821, ppid=122448595, state=RUNNABLE;
> SplitWALRemoteProcedure
> regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172,
>
> worker=regionserver-323.regionserver.hbase.<cluster>,XXXXX,1727308912906}]{code}
> Splitting still in progress on dying rs
> {code:java}
> 2024-10-01 23:02:45,652 INFO
> [G_REPLAY_OPS-regionserver/regionserver-283:XXXXX-0] wal.WALSplitter -
> Splitting
> hdfs://hbase1a/hbase/WALs/regionserver-33.regionserver.hbase.<cluster>,XXXXX,1727362162836-splitting/regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172,
> size=128.1 M (134313407bytes){code}
> rs-323 creating recovered edits
> {code:java}
> 2024-10-01 23:02:42,876 INFO
> [OPS-regionserver/regionserver-323:XXXXX-5-Writer-2]
> monitor.StreamSlowMonitor - New stream slow monitor
> 0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp{code}
> {code:java}
> 2024-10-01 23:02:43,171 INFO
> [OPS-regionserver/regionserver-323:XXXXX-5-Writer-2]
> wal.RecoveredEditsOutputSink - Creating recovered edits writer
> path=hdfs://hbase1a/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp{code}
> rs-283 deletes the above files and again creates the file
> {code:java}
> 2024-10-01 23:02:50,520 WARN
> [OPS-regionserver/regionserver-283:XXXXX-0-Writer-2]
> wal.RecoveredEditsOutputSink - Found old edits file. It could be the result
> of a previous failed split attempt. Deleting
> hdfs://hbase1a/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp,
> length=0{code}
> {code:java}
> 2024-10-01 23:02:50,794 INFO
> [OPS-regionserver/regionserver-283:XXXXX-0-Writer-2]
> monitor.StreamSlowMonitor - New stream slow monitor
> 0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp{code}
> {code:java}
> 2024-10-01 23:02:51,135 INFO
> [OPS-regionserver/regionserver-283:XXXXX-0-Writer-2]
> wal.RecoveredEditsOutputSink - Creating recovered edits writer
> path=hdfs://hbase1a/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp{code}
> Now rs 323 will start failing
> {code:java}
> 2024-10-01 23:03:02,137 WARN [Thread-1081409] hdfs.DataStreamer -
> DataStreamer Exception
> java.io.FileNotFoundException: File does not exist:
> /hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.hbase1a.hbase.core2.aws-prod5-uswest2.aws.sfdc.is%2C60020%2C1727362162836.1727822221172.temp
> (inode 1440741238) [Lease. Holder: DFSClient_NONMAPREDUCE_-2039838105_1,
> pending creates: 21]
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3103)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:610)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2977)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:618)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1105)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3060)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1091)
> at
> org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1939)
> at
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForCreate(DataStreamer.java:1734)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717)
> Caused by:
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File
> does not exist:
> /hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.hbase1a.hbase.core2.aws-prod5-uswest2.aws.sfdc.is%2C60020%2C1727362162836.1727822221172.temp
> (inode 1440741238) [Lease. Holder: DFSClient_NONMAPREDUCE_-2039838105_1,
> pending creates: 21]
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3103)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:610)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2977)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:618)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1105)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3060)
> at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567)
> at org.apache.hadoop.ipc.Client.call(Client.java:1513)
> at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
> at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.lambda$addBlock$11(ClientNamenodeProtocolTranslatorPB.java:495)
> at
> org.apache.hadoop.ipc.internal.ShadedProtobufHelper.ipc(ShadedProtobufHelper.java:160)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:495)
> at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
> at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
> at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
> at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1088)
> ... 3 more
> {code}
> {code:java}
> 2024-10-01 23:03:02,143 ERROR [split-log-closeStream-pool-1]
> wal.RecoveredEditsOutputSink - Could not close recovered edits at
> hdfs://hbase1a/hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp
> java.io.FileNotFoundException: File does not exist:
> /hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp
> (inode 1440741238) [Lease. Holder: DFSClient_NONMAPREDUCE_-2039838105_1,
> pending creates: 21]
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3103)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:610)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2977)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:618)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1105)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3060) at
> sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at
> org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:121)
> at
> org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:88)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1091)
> at
> org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1939)
> at
> org.apache.hadoop.hdfs.DataStreamer.setupPipelineForCreate(DataStreamer.java:1734)
> at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717)
> Caused by:
> org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File
> does not exist:
> /hbase/data/default/SEARCH.REPLAY_ID_BATCH_INDEX_START_INDEX/d3be13a8187ff35746fff1def4f4dba4/recovered.edits/0000000000007468971-regionserver-33.regionserver.hbase.<cluster>%2CXXXXX%2C1727362162836.1727822221172.temp
> (inode 1440741238) [Lease. Holder: DFSClient_NONMAPREDUCE_-2039838105_1,
> pending creates: 21]
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3103)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:610)
> at
> org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2977)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:618)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1105)
> at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1028)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3060) at
> org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567)
> at org.apache.hadoop.ipc.Client.call(Client.java:1513)
> at org.apache.hadoop.ipc.Client.call(Client.java:1410)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139)
> at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.lambda$addBlock$11(ClientNamenodeProtocolTranslatorPB.java:495)
> at
> org.apache.hadoop.ipc.internal.ShadedProtobufHelper.ipc(ShadedProtobufHelper.java:160)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:495)
> at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362)
> at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
> at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
> at sun.reflect.GeneratedMethodAccessor247.invoke(Unknown Source)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:361)
> at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1088)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)