[ 
https://issues.apache.org/jira/browse/HDFS-7496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274434#comment-14274434
 ] 

Colin Patrick McCabe commented on HDFS-7496:
--------------------------------------------

Hi Eddy,

I like the fact that BlockReceiver is now holding on to an 
{{FsVolumeReference}} object, and closing that reference in 
{{BlockReceiver#close}}.  I don't understand how this works, though:

{code}
224           if (replicaInfo instanceof ReplicaInPipeline) {
225             // Hold a reference to protect IOs on the streams.
226             volumeRef = ((ReplicaInPipeline) 
replicaInfo).getVolumeReference();
227           }
{code}

It looks like {{ReplicaInPipeline#getVolumeReference}} just returns the 
reference.  So closing the {{BlockReceiver}} could close the 
{{VolumeReference}} that the {{ReplicaInPipeline}} object is holding on to, 
making it no longer valid.  That doesn't seem right.

Does it make sense for {{ReplicaInfo}} objects to hold on to 
{{FsVolumeReference}} objects at all?  I would argue that it does not.  We 
don't want to keep volumes from being removed just because a {{ReplicaInfo}} 
exists somewhere in memory.  Plus, since there are potentially hundreds of 
thousands of {{ReplicaInfo}} objects, that is a lot of reference counting.  I 
think {{ReplicaInfo}} objects should just contain the unique {{storageID}} of a 
volume.  Then, when we need to create an {{FsVolumeReference}} for a given 
{{storageID}}, we can ask the {{FsDatasetSpi}} to do that.  (This operation can 
also fail, of course, if the storage ID is no longer present.)

> Fix FsVolume removal race conditions on the DataNode 
> -----------------------------------------------------
>
>                 Key: HDFS-7496
>                 URL: https://issues.apache.org/jira/browse/HDFS-7496
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Colin Patrick McCabe
>            Assignee: Lei (Eddy) Xu
>         Attachments: HDFS-7496.000.patch, HDFS-7496.001.patch, 
> HDFS-7496.002.patch
>
>
> We discussed a few FsVolume removal race conditions on the DataNode in 
> HDFS-7489.  We should figure out a way to make removing an FsVolume safe.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to