[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap

xuzq (JIRA) Sat, 27 Apr 2019 07:00:41 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827614#comment-16827614
 ]


xuzq commented on HDFS-13677:
-----------------------------

[~sodonnell] Thanks for your replay.

Reproduce step as blow:
 # A machine has many disks, like /media/disk1, /media/disk2, /media/disk3
 # Each disk has some BlockPool, like /media/disk*/dn/current/BP-1, 
/media/disk*/dn/current/BP-2, /media/disk*/dn/current/BP-3
 # There's a lot of datas in every BP in every disk.
 # Then we dropped the /media/disk1 through "reconfig datanode"
 # Sleep some times
 # Then we add the /media/disk1 to production through "reconfig datanode"

The relevant part of the code  in 'addVolume()' from current trunk looks like:
{code:java}
final ReplicaMap tempVolumeMap = new ReplicaMap(new AutoCloseableLock());
ArrayList<IOException> exceptions = Lists.newArrayList();

for (final NamespaceInfo nsInfo : nsInfos) {
  String bpid = nsInfo.getBlockPoolID();
  try {
    fsVolume.addBlockPool(bpid, this.conf, this.timer);
    fsVolume.getVolumeMap(bpid, tempVolumeMap, ramDiskReplicaTracker);
  } catch (IOException e) {
    LOG.warn("Caught exception when adding " + fsVolume +
        ". Will throw later.", e);
    exceptions.add(e);
  }
}
if (!exceptions.isEmpty()) {
  try {
    sd.unlock();
  } catch (IOException e) {
    exceptions.add(e);
  }
  throw MultipleIOException.createIOException(exceptions);
}

final FsVolumeReference ref = fsVolume.obtainReference();
setupAsyncLazyPersistThread(fsVolume);

builder.build();
activateVolume(tempVolumeMap, sd, storageType, ref);
LOG.info("Added volume - " + location + ", StorageType: " + storageType);
{code}
As we all known, the tempVolumeMap.map contains only blocks for each BP of the 
newly added Storage.

In activateVolume() will add the tempVolumeMap.map into volumeMap like blow.
{code:java}
/**
 * Add all entries from the given replica map into the local replica map.
 */
void addAll(ReplicaMap other) {
  map.putAll(other.map);
}
{code}
But map.putAll(otherMap) will use the new values in otherMap to replace the old 
value in map for the same key. 

The document of HashMap.putAll() as blow. These mapping will replace any 
mapping that this hashtable had for any of keys currently in the specified map.
{code:java}
/**
 * Copies all of the mappings from the specified map to this hashtable.
 * These mappings will replace any mappings that this hashtable had for any
 * of the keys currently in the specified map.
 *
 * @param t mappings to be stored in this map
 * @throws NullPointerException if the specified map is null
 * @since 1.2
 */
public synchronized void putAll(Map<? extends K, ? extends V> t) {
    for (Map.Entry<? extends K, ? extends V> e : t.entrySet())
        put(e.getKey(), e.getValue());
}{code}
 

 

> Dynamic refresh Disk configuration results in overwriting VolumeMap
> -------------------------------------------------------------------
>
>                 Key: HDFS-13677
>                 URL: https://issues.apache.org/jira/browse/HDFS-13677
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0, 3.0.0
>            Reporter: xuzq
>            Priority: Major
>         Attachments: 
> 0001-fix-the-bug-of-the-refresh-disk-configuration.patch, 
> image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png
>
>
> When I added a new disk by dynamically refreshing the configuration, an 
> exception "FileNotFound while finding block" was caused.
>  
> The steps are as follows:
> 1.Change the hdfs-site.xml of DataNode to add a new disk.
> 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode 
> ****:50020 start"
>  
> The error is like:
> ```
> VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block 
> BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume 
> /media/disk5/hdfs/dn
> org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not 
> found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:240)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254)
>  at java.lang.Thread.run(Thread.java:748)
> ```
> I added some logs for confirmation, as follows:
> Log Code like:
> !image-2018-06-14-13-05-54-354.png!
> And the result is like:
> !image-2018-06-14-13-10-24-032.png!  
> The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be 
> overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap 
> other)'.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap

Reply via email to