[
https://issues.apache.org/jira/browse/HDFS-6877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14145835#comment-14145835
]
Lei (Eddy) Xu commented on HDFS-6877:
-------------------------------------
[Sorry for reposting. Mistakenly clicked "Add" during editing comments]
[~cmccabe] Thank you very much for your quick reviews.
I anticipated a behavior that the application is writing the block in very slow
speed. For instance, a light traffic service opens a log file and only appends
10KB logs after waking up for every 30 seconds? Or some applications that dump
immediate results (e.g., 1MB) after every 10 minutes? In such cases, the
BlockReceiver thread can live for a long time, until the block is full and then
be closed, even its {{ReplicaInfo}} has been removed from {{FsDatasetImpl}}.
But from the user point of view, after hdfs dfsadmin -reconfig status indicates
the reconfigure task has finished, the user might expect to be able to remove
the disks physically anytime? Since BlockReceiver thread is still alive for
uncertain time and it opens file descriptor on the disk, it might be difficult
to explain to user why the disk can not be umount and how long they should wait
for it.
{code}
for (ReplicaInPipeline pip : activeWriters) {
try {
LOG.info("Stopping active writer.");
pip.stopWriter(datanode.getDnConf().getXceiverStopTimeout());
} catch (IOException e) {
LOG.warn("IOException when stopping active writer.", e);
}
}
{code}
The reasons that I moved the above out of synchronized section are
# {{stopWrite()}} is calling interrupt() and join(), which are slow. So I
attempted to make the synchronized section in removeVolumes(), which blocks
access to {{FsDatasetImpl}}, shorter.
# After this synchronized critical section, all ReplicaInfo of active
BlockReceivers have been removed. Thus these BlockReceiver threads are
outstanding threads. Since the {{stopWrite()}} is slow, the thread might either
be stopped by {{stopWrite()}} or reach the {{finalizeBlock()}} before there is
a chance to interrupt the thread. But either way, the thread can exit and tells
upstream sender that the block being written is bad.
It might be better to let these two behaviors (e.g., interruption writers and
catching ReplicaNotFoundException for {{finalizeBlock()}}) return same result.
What do you think?
> Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS
> volume is removed during a write.
> -----------------------------------------------------------------------------------------------------------
>
> Key: HDFS-6877
> URL: https://issues.apache.org/jira/browse/HDFS-6877
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: datanode
> Affects Versions: 2.5.0
> Reporter: Lei (Eddy) Xu
> Assignee: Lei (Eddy) Xu
> Attachments: HDFS-6877.000.consolidate.txt,
> HDFS-6877.000.delta-HDFS-6727.txt, HDFS-6877.001.combo.txt,
> HDFS-6877.001.patch, HDFS-6877.002.patch, HDFS-6877.003.patch,
> HDFS-6877.004.patch, HDFS-6877.005.patch
>
>
> Avoid calling checkDisk and stop active BlockReceiver thread when an HDFS
> volume is removed during a write.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)