[ https://issues.apache.org/jira/browse/HDFS-17157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Haoze Wu updated HDFS-17157: ---------------------------- Summary: Transient network failure in lease recovery could be mitigated to ensure better consistency (was: Transient network failure in lease recovery could lead to the block in a datanode in an inconsisetnt state for a long time) > Transient network failure in lease recovery could be mitigated to ensure > better consistency > ------------------------------------------------------------------------------------------- > > Key: HDFS-17157 > URL: https://issues.apache.org/jira/browse/HDFS-17157 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Affects Versions: 2.0.0-alpha > Reporter: Haoze Wu > Priority: Major > > This case is related to HDFS-12070. > In HDFS-12070, we saw how a faulty drive at a certain datanode could lead to > permanent block recovery failure and leaves the file open indefinitely. In > the patch, instead of failing the whole lease recovery process when the > second stage of block recovery is failed at one datanode, the whole lease > recovery process is failed if only these are failed for all the datanodes. > Attached is the code snippet for the second stage of the block recovery, in > BlockRecoveryWorker#syncBlock: > {code:java} > ... > final List<BlockRecord> successList = new ArrayList<>(); > for (BlockRecord r : participatingList) { > try { > r.updateReplicaUnderRecovery(bpid, recoveryId, blockId, > newBlock.getNumBytes()); > successList.add(r); > } catch (IOException e) { > ...{code} > However, because of transient network failure, the RPC in > updateReplicaUnderRecovery initiated from the primary datanode to another > datanode could return an EOFException while the other side does not process > the RPC at all or throw an IOException when reading from the socket. > {code:java} > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:788) > at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1495) > at org.apache.hadoop.ipc.Client.call(Client.java:1437) > at org.apache.hadoop.ipc.Client.call(Client.java:1347) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) > at > org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) > at com.sun.proxy.$Proxy29.updateReplicaUnderRecovery(Unknown Source) > at > org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.updateReplicaUnderRecovery(InterDatanodeProtocolTranslatorPB.java:112) > at > org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.updateReplicaUnderRecovery(BlockRecoveryWorker.java:88) > at > org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.access$700(BlockRecoveryWorker.java:71) > at > org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.syncBlock(BlockRecoveryWorker.java:300) > at > org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:188) > at > org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:606) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:392) > at > org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1796) > at > org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1165) > at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1061) > {code} > Then if there is any other datanode in which the second stage of block > recovery success, the lease recovery would be successful and close the file. > However, the last block failed to be synced to that failed datanode and this > inconsistency could potentially last for a very long time. > To fix the issue, I propose adding a configurable retry of > updateReplicaUnderRecovery RPC so that transient network failure could be > mitigated. > Any comments and suggestions would be appreciated. > -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org