[
https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827984#comment-16827984
]
Stephen O'Donnell commented on HDFS-13677:
------------------------------------------
Hi [~xuzq_zander] - The new patch looks better. A couple of comments:
1. I don't think we need the lock in mergeAll method, as there is a lock in the
add method it calls which protects the structures. With a lock in mergeAll, if
the disk being added has a lot of blocks, it could block the DN adding anything
else to the volumeMap (eg new blocks being created) for a bit of time while all
the volumes blocks are loaded.
2. Do you think we could add a test in TestDataNodeHotSwapVolumes that
reproduces the issue? Eg Have a DN with 1 volume and 5 blocks and then add
another volume with 2 blocks and ensure it reports 7 blocks rather than 2.
3. One test has failed in TestDataNodeHotSwapVolumes. Not sure if its related
to this change or not, so we will need to dig into it and see.
> Dynamic refresh Disk configuration results in overwriting VolumeMap
> -------------------------------------------------------------------
>
> Key: HDFS-13677
> URL: https://issues.apache.org/jira/browse/HDFS-13677
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: xuzq
> Priority: Major
> Attachments: HDFS-13677-001.patch, image-2018-06-14-13-05-54-354.png,
> image-2018-06-14-13-10-24-032.png
>
>
> When I added a new disk by dynamically refreshing the configuration, an
> exception "FileNotFound while finding block" was caused.
>
> The steps are as follows:
> 1.Change the hdfs-site.xml of DataNode to add a new disk.
> 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode
> ****:50020 start"
>
> The error is like:
> ```
> VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block
> BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume
> /media/disk5/hdfs/dn
> org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not
> found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:240)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254)
> at java.lang.Thread.run(Thread.java:748)
> ```
> I added some logs for confirmation, as follows:
> Log Code like:
> !image-2018-06-14-13-05-54-354.png!
> And the result is like:
> !image-2018-06-14-13-10-24-032.png!
> The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be
> overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap
> other)'.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]