[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap

Stephen O'Donnell (JIRA) Sat, 27 Apr 2019 12:30:36 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-13677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16827730#comment-16827730
 ]


Stephen O'Donnell commented on HDFS-13677:
------------------------------------------

Looking at the patch [~xuzq_zander] uploaded some time ago:

1. I wonder if we should change the addAll method as other parts of the code 
may be using it and expecting it to simply replace the blockpool? It would be 
worth a look around to see if its being used in any other places. Perhaps we 
should add a new method "mergeAll" which does what we need here and better 
describes its purpose?

2. Rather than the new method addAndNotReplace, we should just call the 
existing method add:

{code}
  ReplicaInfo add(String bpid, ReplicaInfo replicaInfo) {
    checkBlockPool(bpid);
    checkBlock(replicaInfo);
    try (AutoCloseableLock l = lock.acquire()) {
      FoldedTreeSet<ReplicaInfo> set = map.get(bpid);
      if (set == null) {
        // Add an entry for block pool if it does not exist already
        set = new FoldedTreeSet<>();
        map.put(bpid, set);
      }
      return set.addOrReplace(replicaInfo);
    }
  }
{code}

It handles adding the blockpool entry if it is needed and also puts the 
required locking around the calls to make it threadsafe.

> Dynamic refresh Disk configuration results in overwriting VolumeMap
> -------------------------------------------------------------------
>
>                 Key: HDFS-13677
>                 URL: https://issues.apache.org/jira/browse/HDFS-13677
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.6.0, 3.0.0
>            Reporter: xuzq
>            Priority: Major
>         Attachments: 
> 0001-fix-the-bug-of-the-refresh-disk-configuration.patch, 
> image-2018-06-14-13-05-54-354.png, image-2018-06-14-13-10-24-032.png
>
>
> When I added a new disk by dynamically refreshing the configuration, an 
> exception "FileNotFound while finding block" was caused.
>  
> The steps are as follows:
> 1.Change the hdfs-site.xml of DataNode to add a new disk.
> 2.Refresh the configuration by "./bin/hdfs dfsadmin -reconfig datanode 
> ****:50020 start"
>  
> The error is like:
> ```
> VolumeScannerThread(/media/disk5/hdfs/dn): FileNotFound while finding block 
> BP-233501496-*.*.*.*-1514185698256:blk_1620868560_547245090 on volume 
> /media/disk5/hdfs/dn
> org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not 
> found for BP-1997955181-*.*.*.*-1514186468560:blk_1090885868_17145082
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.getReplica(BlockSender.java:471)
>  at 
> org.apache.hadoop.hdfs.server.datanode.BlockSender.<init>(BlockSender.java:240)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:553)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:148)
>  at 
> org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:103)
>  at 
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:254)
>  at java.lang.Thread.run(Thread.java:748)
> ```
> I added some logs for confirmation, as follows:
> Log Code like:
> !image-2018-06-14-13-05-54-354.png!
> And the result is like:
> !image-2018-06-14-13-10-24-032.png!  
> The Size of 'VolumeMap' has been reduced, and We found the 'VolumeMap' to be 
> overridden by the new Disk Block by the method 'ReplicaMap.addAll(ReplicaMap 
> other)'.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-13677) Dynamic refresh Disk configuration results in overwriting VolumeMap

Reply via email to