[
https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827656#comment-16827656
]
Stephen O'Donnell commented on HDFS-13677:
------------------------------------------
I tried to reproduce this with current trunk (namenode and one DN running
locally) as I had it built already and the problem does not occur. If I:
# Format the NN
# Add 10 files giving me 10 blocks
# Add a disk and reconfig - what I expected to happen due to this bug, is it
will 'forget' the 10 blocks when it issues a FBR, but it does not
{code}
<startup DN>
...
2019-04-27 17:06:54,561 INFO datanode.DataNode: Successfully sent block report
0x6850d9b5dfeab332, containing 1 storage report(s), of which we sent 1. The
reports had 10 total blocks and used 1 RPC(s). This took 3 msec to generate and
20 msecs for RPC and NN processing. Got back one command: FinalizeCommand/5.
<We can see it has only 10 blocks>
...
<Reconfigure here>
2019-04-27 17:13:54,111 INFO datanode.DataNode: Reconfiguring
dfs.datanode.data.dir to
/tmp/hadoop-sodonnell/dfs/data,/tmp/hadoop-sodonnell/dfs/data2
2019-04-27 17:13:54,122 INFO datanode.DataNode: Adding new volumes:
[DISK]file:/tmp/hadoop-sodonnell/dfs/data2
2019-04-27 17:13:54,123 INFO common.Storage:
/private/tmp/hadoop-sodonnell/dfs/data2 does not exist. Creating ...
2019-04-27 17:13:54,281 INFO common.Storage: Lock on
/tmp/hadoop-sodonnell/dfs/data2/in_use.lock acquired by nodename
[email protected]
2019-04-27 17:13:54,281 INFO common.Storage: Storage directory with location
[DISK]file:/tmp/hadoop-sodonnell/dfs/data2 is not formatted for namespace
79966483. Formatting...
2019-04-27 17:13:54,282 INFO common.Storage: Generated new storageID
DS-9449564c-f688-46b8-b88d-2aacd53810f4 for directory
/tmp/hadoop-sodonnell/dfs/data2
2019-04-27 17:13:54,304 INFO common.Storage: Locking is disabled for
/tmp/hadoop-sodonnell/dfs/data2/current/BP-615868015-192.168.0.24-1556381051528
2019-04-27 17:13:54,304 INFO common.Storage: Block pool storage directory for
location [DISK]file:/tmp/hadoop-sodonnell/dfs/data2 and block pool id
BP-615868015-192.168.0.24-1556381051528 is not formatted. Formatting ...
2019-04-27 17:13:54,304 INFO common.Storage: Formatting block pool
BP-615868015-192.168.0.24-1556381051528 directory
/tmp/hadoop-sodonnell/dfs/data2/current/BP-615868015-192.168.0.24-1556381051528/current
2019-04-27 17:13:54,324 INFO impl.BlockPoolSlice: Replica Cache file:
/tmp/hadoop-sodonnell/dfs/data2/current/BP-615868015-192.168.0.24-1556381051528/current/replicas
doesn't exist
2019-04-27 17:13:54,325 INFO impl.FsDatasetImpl: Added new volume:
DS-9449564c-f688-46b8-b88d-2aacd53810f4
2019-04-27 17:13:54,325 INFO impl.FsDatasetImpl: Added volume -
[DISK]file:/tmp/hadoop-sodonnell/dfs/data2, StorageType: DISK
2019-04-27 17:13:54,325 INFO datanode.DataNode: Successfully added volume:
[DISK]file:/tmp/hadoop-sodonnell/dfs/data2
2019-04-27 17:13:54,326 INFO datanode.DataNode: Block pool
BP-615868015-192.168.0.24-1556381051528 (Datanode Uuid
93ca3bd5-ee44-4506-b952-dc243eac4d18): scheduling a full block report.
2019-04-27 17:13:54,326 INFO datanode.DataNode: Forcing a full block report to
localhost/127.0.0.1:8020
2019-04-27 17:13:54,326 INFO conf.ReconfigurableBase: Property
rpc.engine.org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolPB is not
configurable: old value: org.apache.hadoop.ipc.ProtobufRpcEngine, new value:
null
2019-04-27 17:13:54,329 INFO datanode.DataNode: Successfully sent block report
0x6850d9b5dfeab333, containing 2 storage report(s), of which we sent 2. The
reports had 10 total blocks and used 1 RPC(s). This took 0 msec to generate and
3 msecs for RPC and NN processing. Got back no commands.
< ^^^ Note still 10 blocks reported as expected >
{code}
> Dynamic refresh Disk configuration results in overwriting VolumeMap
> -------------------------------------------------------------------
>
> Key: HDFS-13677
> URL: https://issues.apache.org/jira/browse/HDFS-13677
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.6.0, 3.0.0
> Reporter: xuzq
> Priority: Major
> Attachments:
> 0001-fix-the-bug-of-the-refresh-disk-configuration.patch,
> image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png
>
>
> When I added a new disk by dynamically refreshing the configuration, an
> exception "FileNotFound while finding block" was caused.
>
> The steps are as follows:
> 1.Change the hdfs-site.xml of DataNode to add a new disk.
> 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode
> ****:50020 start"
>
> The error is like:
> ```
> VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block
> BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume
> /media/disk5/hdfs/dn
> org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not
> found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:240)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148)
> at
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
> at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254)
> at java.lang.Thread.run(Thread.java:748)
> ```
> I added some logs for confirmation, as follows:
> Log Code like:
> !image-2018-06-14-13-05-54-354.png!
> And the result is like:
> !image-2018-06-14-13-10-24-032.png!
> The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be
> overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap
> other)'.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]