Haoze Wu created HDFS-17157:
-------------------------------
Summary: Transient network failure in lease recovery could lead to
a datanode in an inconsisetnt state for a long time
Key: HDFS-17157
URL: https://issues.apache.org/jira/browse/HDFS-17157
Project: Hadoop HDFS
Issue Type: Improvement
Components: datanode
Affects Versions: 2.0.0-alpha
Reporter: Haoze Wu
This case is related to HDFS-12070.
In HDFS-12070, we saw how a faulty drive at a certain datanode could lead to
permanent block recovery failure and leaves the file open indefinitely. In the
patch, instead of failing the whole lease recovery process when the second
stage of block recovery is failed at one datanode, the whole lease recovery
process is failed if only these are failed for all the datanodes.
Attached is the code snippet for the second stage of the block recovery, in
BlockRecoveryWorker#syncBlock:
{code:java}
...
final List<BlockRecord> successList = new ArrayList<>();
for (BlockRecord r : participatingList) {
try {
r.updateReplicaUnderRecovery(bpid, recoveryId, blockId,
newBlock.getNumBytes());
successList.add(r);
} catch (IOException e) {
...{code}
However, because of transient network failure, the RPC in
updateReplicaUnderRecovery initiated from the primary datanode to another
datanode could return an EOFException while the other side does not process the
RPC at all or throw an IOException when reading from the socket.
{code:java}
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:788)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1495)
at org.apache.hadoop.ipc.Client.call(Client.java:1437)
at org.apache.hadoop.ipc.Client.call(Client.java:1347)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy29.updateReplicaUnderRecovery(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.InterDatanodeProtocolTranslatorPB.updateReplicaUnderRecovery(InterDatanodeProtocolTranslatorPB.java:112)
at
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.updateReplicaUnderRecovery(BlockRecoveryWorker.java:88)
at
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$BlockRecord.access$700(BlockRecoveryWorker.java:71)
at
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.syncBlock(BlockRecoveryWorker.java:300)
at
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$RecoveryTaskContiguous.recover(BlockRecoveryWorker.java:188)
at
org.apache.hadoop.hdfs.server.datanode.BlockRecoveryWorker$1.run(BlockRecoveryWorker.java:606)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at
org.apache.hadoop.ipc.Client$IpcStreams.readResponse(Client.java:1796)
at
org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1165)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:1061) {code}
Then if there is any other datanode in which the second stage of block recovery
success, the lease recovery would be successful and close the file. However,
the last block failed to be synced to that failed datanode and this
inconsistent could potentially last for a very long time.
To fix the issue, I propose adding a configurable retry of
updateReplicaUnderRecovery RPC so that transient network failure could be
tolerated.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]