[ 
https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145835#comment-14145835
 ] 

Lei (Eddy) Xu commented on HDFS-6877:
-------------------------------------

[Sorry for reposting. Mistakenly clicked "Add" during editing comments]

[~cmccabe] Thank you very much for your quick reviews.

I anticipated a behavior that the application is writing the block in very slow 
speed. For instance, a light traffic service opens a log file and only appends 
10KB logs after waking up for every 30 seconds? Or some applications that dump 
immediate results (e.g., 1MB) after every 10 minutes? In such cases, the 
BlockReceiver thread can live for a long time, until the block is full and then 
be closed, even its {{ReplicaInfo}} has been removed from {{FsDatasetImpl}}. 
But from the user point of view, after hdfs dfsadmin -reconfig status indicates 
the reconfigure task has finished, the user might expect to be able to remove 
the disks physically anytime? Since BlockReceiver thread is still alive for 
uncertain time and it opens file descriptor on the disk, it might be difficult 
to explain to user why the disk can not be umount and how long they should wait 
for it.

{code}
    for (ReplicaInPipeline pip : activeWriters) {
      try {
        LOG.info("Stopping active writer.");
        pip.stopWriter(datanode.getDnConf().getXceiverStopTimeout());
      } catch (IOException e) {
        LOG.warn("IOException when stopping active writer.", e);
      }
    }
{code}

The reasons that I moved the above out of synchronized section are
# {{stopWrite()}} is calling interrupt() and join(), which are slow. So I 
attempted to make the synchronized section in removeVolumes(), which blocks 
access to {{FsDatasetImpl}},  shorter.
# After this synchronized critical section, all ReplicaInfo of active 
BlockReceivers have been removed. Thus these BlockReceiver threads are 
outstanding threads. Since the {{stopWrite()}} is slow, the thread might either 
be stopped by {{stopWrite()}} or reach the {{finalizeBlock()}} before there is 
a chance to interrupt the thread. But either way, the thread can exit and tells 
upstream sender that the block being written is bad.

It might be better to let these two behaviors (e.g., interruption writers and 
catching ReplicaNotFoundException for {{finalizeBlock()}}) return same result. 
What do you think?

> Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS 
> volume is removed during a write.
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-6877
>                 URL: https://issues.apache.org/jira/browse/HDFS-6877
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>    Affects Versions: 2.5.0
>            Reporter: Lei (Eddy) Xu
>            Assignee: Lei (Eddy) Xu
>         Attachments: HDFS-6877.000.consolidate.txt, 
> HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt, 
> HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch, 
> HDFS-6877.004.patch, HDFS-6877.005.patch
>
>
> Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS 
> volume is removed during a write.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to