[
https://issues.apache.org/jira/browse/HDFS-6569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14043915#comment-14043915
]
Brandon Li commented on HDFS-6569:
----------------------------------
The current code looks good logically and it tries not closing streams before
the OOB is sent.
I think problem is triggered by the NIO implementation. When DataNode is
shutdown for restart, it interrupts all the DataXceiver threads. The NIO
channel in NioInetPeer are bound to these threads doing the block receiving. If
these threads are interrupted, the stream / channel is closed due to IO safety
issues.
So once the DataXceiver thread is interrupted, rarely the OOB can be sent
before NIO channel is closed automatically.
One possible fix is to send OOB message before interrupting DataXceiver threads.
Thoughts?
> OOB message can't be sent to the client when DataNode shuts down for upgrade
> ----------------------------------------------------------------------------
>
> Key: HDFS-6569
> URL: https://issues.apache.org/jira/browse/HDFS-6569
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 3.0.0, 2.4.0
> Reporter: Brandon Li
>
> The socket is closed too early before the OOB message can be sent to client,
> which causes the write pipeline failure.
--
This message was sent by Atlassian JIRA
(v6.2#6252)