Daryn Sharp created HDFS-8491: --------------------------------- Summary: DN shutdown race conditions with open xceivers Key: HDFS-8491 URL: https://issues.apache.org/jira/browse/HDFS-8491 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.6.0 Reporter: Daryn Sharp
DN shutdowns at least for restarts have many race conditions. Shutdown is very noisy with exceptions. The DN notifies writers of the restart, waits 1s and then interrupts the xceiver threads but does not join. The ipc server is stopped and then the bpos services are stopped. Xceivers then encounter NPEs in closeBlock because the block no longer exists in the volume map when transient storage is checked. Just before that, the DN notifies the NN the block was received. This does not appear to always be true, but rather that the thread was interrupted. They race with bpos shutdown, and luckily appear to lose, to send the block received. -- This message was sent by Atlassian JIRA (v6.3.4#6332)