[
https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827730#comment-16827730
]
Stephen O'Donnell commented on HDFS-13677:
------------------------------------------
Looking at the patch [~xuzq_zander] uploaded some time ago:
1. I wonder if we should change the addAll method as other parts of the code
may be using it and expecting it to simply replace the blockpool? It would be
worth a look around to see if its being used in any other places. Perhaps we
should add a new method "mergeAll" which does what we need here and better
describes its purpose?
2. Rather than the new method addAndNotReplace, we should just call the
existing method add:
{code}
ReplicaInfo add(String bpid, ReplicaInfo replicaInfo) {
checkBlockPool(bpid);
checkBlock(replicaInfo);
try (AutoCloseableLock l = lock.acquire()) {
FoldedTreeSet<ReplicaInfo> set = map.get(bpid);
if (set == null) {
// Add an entry for block pool if it does not exist already
set = new FoldedTreeSet<>();
map.put(bpid, set);
}
return set.addOrReplace(replicaInfo);
}
}
{code}
It handles adding the blockpool entry if it is needed and also puts the
required locking around the calls to make it threadsafe.
> Dynamic refresh Disk configuration results in overwriting VolumeMap
> -------------------------------------------------------------------
>
> Key: HDFS-13677
> URL: https://issues.apache.org/jira/browse/HDFS-13677
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.6.0, 3.0.0
> Reporter: xuzq
> Priority: Major
> Attachments:
> 0001-fix-the-bug-of-the-refresh-disk-configuration.patch,
> image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png
>
>
> When I added a new disk by dynamically refreshing the configuration, an
> exception "FileNotFound while finding block" was caused.
>
> The steps are as follows:
> 1.Change the hdfs-site.xml of DataNode to add a new disk.
> 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode
> ****:50020 start"
>
> The error is like:
> ```
> VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block
> BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume
> /media/disk5/hdfs/dn
> org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not
> found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:240)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254)
> at java.lang.Thread.run(Thread.java:748)
> ```
> I added some logs for confirmation, as follows:
> Log Code like:
> !image-2018-06-14-13-05-54-354.png!
> And the result is like:
> !image-2018-06-14-13-10-24-032.png!
> The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be
> overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap
> other)'.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]