[jira] [Updated] (HDFS-17157) Transient network failure in lease recovery could lead to the block in a datanode in an inconsisetnt state for a long time

Haoze Wu (Jira) Fri, 11 Aug 2023 21:35:03 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-17157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Haoze Wu updated HDFS-17157:
----------------------------
    Description: 
This case is related to HDFS-12070.

In HDFS-12070, we saw how a faulty drive at a certain datanode could lead to 
permanent block recovery failure and leaves the file open indefinitely.  In the 
patch, instead of failing the whole lease recovery process when the second 
stage of block recovery is failed at one datanode, the whole lease recovery 
process is failed if only these are failed for all the datanodes. 

Attached is the code snippet for the second stage of the block recovery, in 
BlockRecoveryWorker#syncBlock:
{code:java}
...
final List<BlockRecord> successList = new ArrayList<>();     
for (BlockRecord r : participatingList) {        
  try {          
    r.updateReplicaUnderRecovery(bpid, recoveryId, blockId,     
newBlock.getNumBytes());     
    successList.add(r);        
  } catch (IOException e) { 
...{code}
However, because of transient network failure, the RPC in 
updateReplicaUnderRecovery initiated from the primary datanode to another 
datanode could return an EOFException while the other side does not process the 
RPC at all or throw an IOException when reading from the socket. 
{code:java}
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:788)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1495)
        at org.apache.hadoop.ipc.Client.call(Client.java:1437)
        at org.apache.hadoop.ipc.Client.call(Client.java:1347)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy29.updateReplicaUnderRecovery(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.updateReplicaUnderRecovery(InterDatanodeProtocolTranslatorPB.java:112)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.updateReplicaUnderRecovery(BlockRecoveryWorker.java:88)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.access$700(BlockRecoveryWorker.java:71)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.syncBlock(BlockRecoveryWorker.java:300)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:188)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:606)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at 
org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1796)
        at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1165)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1061) {code}
Then if there is any other datanode in which the second stage of block recovery 
success, the lease recovery would be successful and close the file. However, 
the last block failed to be synced to that failed datanode and this 
inconsistency could potentially last for a very long time. 

To fix the issue, I propose adding a configurable retry of 
updateReplicaUnderRecovery RPC so that transient network failure could be 
mitigated.

Any comments and suggestions would be appreciated.

 

  was:
This case is related to HDFS-12070.

In HDFS-12070, we saw how a faulty drive at a certain datanode could lead to 
permanent block recovery failure and leaves the file open indefinitely.  In the 
patch, instead of failing the whole lease recovery process when the second 
stage of block recovery is failed at one datanode, the whole lease recovery 
process is failed if only these are failed for all the datanodes. 

Attached is the code snippet for the second stage of the block recovery, in 
BlockRecoveryWorker#syncBlock:
{code:java}
...
final List<BlockRecord> successList = new ArrayList<>();     
for (BlockRecord r : participatingList) {        
  try {          
    r.updateReplicaUnderRecovery(bpid, recoveryId, blockId,     
newBlock.getNumBytes());     
    successList.add(r);        
  } catch (IOException e) { 
...{code}
However, because of transient network failure, the RPC in 
updateReplicaUnderRecovery initiated from the primary datanode to another 
datanode could return an EOFException while the other side does not process the 
RPC at all or throw an IOException when reading from the socket. 
{code:java}
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:788)
        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1495)
        at org.apache.hadoop.ipc.Client.call(Client.java:1437)
        at org.apache.hadoop.ipc.Client.call(Client.java:1347)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy29.updateReplicaUnderRecovery(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.updateReplicaUnderRecovery(InterDatanodeProtocolTranslatorPB.java:112)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.updateReplicaUnderRecovery(BlockRecoveryWorker.java:88)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.access$700(BlockRecoveryWorker.java:71)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.syncBlock(BlockRecoveryWorker.java:300)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:188)
        at 
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:606)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at 
org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1796)
        at 
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1165)
        at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1061) {code}
Then if there is any other datanode in which the second stage of block recovery 
success, the lease recovery would be successful and close the file. However, 
the last block failed to be synced to that failed datanode and this 
inconsistency could potentially last for a very long time. 

To fix the issue, I propose adding a configurable retry of 
updateReplicaUnderRecovery RPC so that transient network failure could be 
mitigated.

 

 


> Transient network failure in lease recovery could lead to the block in a 
> datanode in an inconsisetnt state for a long time
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17157
>                 URL: https://issues.apache.org/jira/browse/HDFS-17157
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.0.0-alpha
>            Reporter: Haoze Wu
>            Priority: Major
>
> This case is related to HDFS-12070.
> In HDFS-12070, we saw how a faulty drive at a certain datanode could lead to 
> permanent block recovery failure and leaves the file open indefinitely.  In 
> the patch, instead of failing the whole lease recovery process when the 
> second stage of block recovery is failed at one datanode, the whole lease 
> recovery process is failed if only these are failed for all the datanodes. 
> Attached is the code snippet for the second stage of the block recovery, in 
> BlockRecoveryWorker#syncBlock:
> {code:java}
> ...
> final List<BlockRecord> successList = new ArrayList<>();     
> for (BlockRecord r : participatingList) {        
>   try {          
>     r.updateReplicaUnderRecovery(bpid, recoveryId, blockId,     
> newBlock.getNumBytes());     
>     successList.add(r);        
>   } catch (IOException e) { 
> ...{code}
> However, because of transient network failure, the RPC in 
> updateReplicaUnderRecovery initiated from the primary datanode to another 
> datanode could return an EOFException while the other side does not process 
> the RPC at all or throw an IOException when reading from the socket. 
> {code:java}
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:788)
>         at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1495)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1437)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1347)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
>         at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
>         at com.sun.proxy.$Proxy29.updateReplicaUnderRecovery(Unknown Source)
>         at 
> org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.updateReplicaUnderRecovery(InterDatanodeProtocolTranslatorPB.java:112)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.updateReplicaUnderRecovery(BlockRecoveryWorker.java:88)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.access$700(BlockRecoveryWorker.java:71)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.syncBlock(BlockRecoveryWorker.java:300)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:188)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:606)
>         at java.lang.Thread.run(Thread.java:748)
> Caused by: java.io.EOFException
>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>         at 
> org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1796)
>         at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1165)
>         at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1061) 
> {code}
> Then if there is any other datanode in which the second stage of block 
> recovery success, the lease recovery would be successful and close the file. 
> However, the last block failed to be synced to that failed datanode and this 
> inconsistency could potentially last for a very long time. 
> To fix the issue, I propose adding a configurable retry of 
> updateReplicaUnderRecovery RPC so that transient network failure could be 
> mitigated.
> Any comments and suggestions would be appreciated.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-17157) Transient network failure in lease recovery could lead to the block in a datanode in an inconsisetnt state for a long time

Reply via email to