Daryn Sharp created HDFS-8491:
---------------------------------
Summary: DN shutdown race conditions with open xceivers
Key: HDFS-8491
URL: https://issues.apache.org/jira/browse/HDFS-8491
Project: Hadoop HDFS
Issue Type: Bug
Components: datanode
Affects Versions: 2.6.0
Reporter: Daryn Sharp
DN shutdowns at least for restarts have many race conditions. Shutdown is very
noisy with exceptions. The DN notifies writers of the restart, waits 1s and
then interrupts the xceiver threads but does not join. The ipc server is
stopped and then the bpos services are stopped.
Xceivers then encounter NPEs in closeBlock because the block no longer exists
in the volume map when transient storage is checked. Just before that, the DN
notifies the NN the block was received. This does not appear to always be
true, but rather that the thread was interrupted. They race with bpos shutdown,
and luckily appear to lose, to send the block received.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)