[
https://issues.apache.org/jira/browse/HDFS-7496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14238621#comment-14238621
]
Colin Patrick McCabe commented on HDFS-7496:
--------------------------------------------
So, FsVolume removal can happen because of DN reconfiguration (HDFS-6727), or
because a failure was detected in {{FsVolumeList#checkDirs}} (see HDFS-7489 for
more discussion). While we can prevent certain race conditions by locking the
{{FsVolumeList}} object itself, other race conditions are more fundamental.
For example, if someone calls {{FsVolumeList#getNextVolume}}, the volume
instance they get back may be removed before they use it, or even while they
are using it.
We can't fix this with a "big lock" unless we lock all operations which use
volumes, which seems unreasonable. We could fix this in a few different ways.
We could do explicit reference counting. This is a bit tricky because someone
might forget to unreference the volume after using it. It's kind of like a
file descriptor leak at that point. Another way would be to use Java's
{{PhantomReference}} stuff to determine when the {{FsVolumeImpl}} objects are
no longer being referenced.
A related point is that we often refer to volumes by their base path. But
actually, we could destroy a volume and re-create another volume with the same
base path. This leads to a lot of subtle races. To solve this, we could try
to start using storageIDs more heavily, because they are globally unique. I'm
not sure if there is any other good solution to this?
> Fix FsVolume removal race conditions on the DataNode
> -----------------------------------------------------
>
> Key: HDFS-7496
> URL: https://issues.apache.org/jira/browse/HDFS-7496
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
>
> We discussed a few FsVolume removal race conditions on the DataNode in
> HDFS-7489. We should figure out a way to make removing an FsVolume safe.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)