[
https://issues.apache.org/jira/browse/HDFS-7496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274434#comment-14274434
]
Colin Patrick McCabe commented on HDFS-7496:
--------------------------------------------
Hi Eddy,
I like the fact that BlockReceiver is now holding on to an
{{FsVolumeReference}} object, and closing that reference in
{{BlockReceiver#close}}. I don't understand how this works, though:
{code}
224 if (replicaInfo instanceof ReplicaInPipeline) {
225 // Hold a reference to protect IOs on the streams.
226 volumeRef = ((ReplicaInPipeline)
replicaInfo).getVolumeReference();
227 }
{code}
It looks like {{ReplicaInPipeline#getVolumeReference}} just returns the
reference. So closing the {{BlockReceiver}} could close the
{{VolumeReference}} that the {{ReplicaInPipeline}} object is holding on to,
making it no longer valid. That doesn't seem right.
Does it make sense for {{ReplicaInfo}} objects to hold on to
{{FsVolumeReference}} objects at all? I would argue that it does not. We
don't want to keep volumes from being removed just because a {{ReplicaInfo}}
exists somewhere in memory. Plus, since there are potentially hundreds of
thousands of {{ReplicaInfo}} objects, that is a lot of reference counting. I
think {{ReplicaInfo}} objects should just contain the unique {{storageID}} of a
volume. Then, when we need to create an {{FsVolumeReference}} for a given
{{storageID}}, we can ask the {{FsDatasetSpi}} to do that. (This operation can
also fail, of course, if the storage ID is no longer present.)
> Fix FsVolume removal race conditions on the DataNode
> -----------------------------------------------------
>
> Key: HDFS-7496
> URL: https://issues.apache.org/jira/browse/HDFS-7496
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Colin Patrick McCabe
> Assignee: Lei (Eddy) Xu
> Attachments: HDFS-7496.000.patch, HDFS-7496.001.patch,
> HDFS-7496.002.patch
>
>
> We discussed a few FsVolume removal race conditions on the DataNode in
> HDFS-7489. We should figure out a way to make removing an FsVolume safe.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)