Daryn Sharp created HDFS-8491:
---------------------------------

             Summary: DN shutdown race conditions with open xceivers
                 Key: HDFS-8491
                 URL: https://issues.apache.org/jira/browse/HDFS-8491
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: datanode
    Affects Versions: 2.6.0
            Reporter: Daryn Sharp


DN shutdowns at least for restarts have many race conditions.  Shutdown is very 
noisy with exceptions.  The DN notifies writers of the restart, waits 1s and 
then interrupts the xceiver threads but does not join.  The ipc server is 
stopped and then the bpos services are stopped.

Xceivers then encounter NPEs in closeBlock because the block no longer exists 
in the volume map when transient storage is checked.  Just before that, the DN 
notifies the NN the block was received.  This does not appear to always be 
true, but rather that the thread was interrupted. They race with bpos shutdown, 
and luckily appear to lose, to send the block received.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to