[ https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827705#comment-16827705 ]
Virajith Jalaparti commented on HDFS-13677: ------------------------------------------- The issue seems to happen when the new volume being added has some blocks in a block pool that exists in other volumes in the Datanode. As mentioned by [~xuzq_zander], the bug is due to the way {{ReplicaMap#addAll}} is implemented. I was able to reproduce this bug on trunk by modifying one of the unit tests in {{TestDataNodeHotSwapVolumes}}. [~arpitagarwal] - did you see this problem? > Dynamic refresh Disk configuration results in overwriting VolumeMap > ------------------------------------------------------------------- > > Key: HDFS-13677 > URL: https://issues.apache.org/jira/browse/HDFS-13677 > Project: Hadoop HDFS > Issue Type: Bug > Affects Versions: 2.6.0, 3.0.0 > Reporter: xuzq > Priority: Major > Attachments: > 0001-fix-the-bug-of-the-refresh-disk-configuration.patch, > image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png > > > When I added a new disk by dynamically refreshing the configuration, an > exception "FileNotFound while finding block" was caused. > > The steps are as follows: > 1.Change the hdfs-site.xml of DataNode to add a new disk. > 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode > ****:50020 start" > > The error is like: > ``` > VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block > BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume > /media/disk5/hdfs/dn > org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not > found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082 > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471) > at > org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:240) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148) > at > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103) > at > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254) > at java.lang.Thread.run(Thread.java:748) > ``` > I added some logs for confirmation, as follows: > Log Code like: > !image-2018-06-14-13-05-54-354.png! > And the result is like: > !image-2018-06-14-13-10-24-032.png! > The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be > overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap > other)'. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org