[ https://issues.apache.org/jira/browse/HDFS-7496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274434#comment-14274434 ]
Colin Patrick McCabe commented on HDFS-7496: -------------------------------------------- Hi Eddy, I like the fact that BlockReceiver is now holding on to an {{FsVolumeReference}} object, and closing that reference in {{BlockReceiver#close}}. I don't understand how this works, though: {code} 224 if (replicaInfo instanceof ReplicaInPipeline) { 225 // Hold a reference to protect IOs on the streams. 226 volumeRef = ((ReplicaInPipeline) replicaInfo).getVolumeReference(); 227 } {code} It looks like {{ReplicaInPipeline#getVolumeReference}} just returns the reference. So closing the {{BlockReceiver}} could close the {{VolumeReference}} that the {{ReplicaInPipeline}} object is holding on to, making it no longer valid. That doesn't seem right. Does it make sense for {{ReplicaInfo}} objects to hold on to {{FsVolumeReference}} objects at all? I would argue that it does not. We don't want to keep volumes from being removed just because a {{ReplicaInfo}} exists somewhere in memory. Plus, since there are potentially hundreds of thousands of {{ReplicaInfo}} objects, that is a lot of reference counting. I think {{ReplicaInfo}} objects should just contain the unique {{storageID}} of a volume. Then, when we need to create an {{FsVolumeReference}} for a given {{storageID}}, we can ask the {{FsDatasetSpi}} to do that. (This operation can also fail, of course, if the storage ID is no longer present.) > Fix FsVolume removal race conditions on the DataNode > ----------------------------------------------------- > > Key: HDFS-7496 > URL: https://issues.apache.org/jira/browse/HDFS-7496 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Colin Patrick McCabe > Assignee: Lei (Eddy) Xu > Attachments: HDFS-7496.000.patch, HDFS-7496.001.patch, > HDFS-7496.002.patch > > > We discussed a few FsVolume removal race conditions on the DataNode in > HDFS-7489. We should figure out a way to make removing an FsVolume safe. -- This message was sent by Atlassian JIRA (v6.3.4#6332)