[
https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827648#comment-16827648
]
Stephen O'Donnell commented on HDFS-13677:
------------------------------------------
Ah, I thought the variable VolumeMap was a per volume set, but its actually a
map of BPID -> Replica Info.
So when we build the new tempVolumeMap, as you said, it only contains blocks
for the new storage and then when it gets merged into the master VolumeMap, it
wipes out all other blocks in the map as it replaces the original BPID key in
the map.
If I am reading this correctly, does that mean the dynamic disk add feature
basically doesn't work at all and this is reproducible easily, with a single
block pool?
If I get some time I will see if I can reproduce this on a test cluster.
> Dynamic refresh Disk configuration results in overwriting VolumeMap
> -------------------------------------------------------------------
>
> Key: HDFS-13677
> URL: https://issues.apache.org/jira/browse/HDFS-13677
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.6.0, 3.0.0
> Reporter: xuzq
> Priority: Major
> Attachments:
> 0001-fix-the-bug-of-the-refresh-disk-configuration.patch,
> image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png
>
>
> When I added a new disk by dynamically refreshing the configuration, an
> exception "FileNotFound while finding block" was caused.
>
> The steps are as follows:
> 1.Change the hdfs-site.xml of DataNode to add a new disk.
> 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode
> ****:50020 start"
>
> The error is like:
> ```
> VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block
> BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume
> /media/disk5/hdfs/dn
> org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not
> found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:240)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254)
> at java.lang.Thread.run(Thread.java:748)
> ```
> I added some logs for confirmation, as follows:
> Log Code like:
> !image-2018-06-14-13-05-54-354.png!
> And the result is like:
> !image-2018-06-14-13-10-24-032.png!
> The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be
> overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap
> other)'.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]