[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

Lisheng Sun (JIRA) Sun, 21 Jul 2019 00:15:23 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889667#comment-16889667
 ]


Lisheng Sun commented on HDFS-14313:
------------------------------------

Thank [~linyiqun] for your review carefully. I have a few questions to discuss 
with you.

1 I don't use the component specific impl class in the common module. to update 
in GetSpaceUsed of common model is just to make subclasses inheritable. And 
that to update in CommonConfigurationKeys of common model is to print threshold 
time，which should be moved to DFSConfigKeys and is more appropriate.

2. Now there are switches that are used in control. follow as 
GetSpaceUsed#Builder#CLASSNAME_KEY
{code:java}
// static final String CLASSNAME_KEY = "fs.getspaceused.classname";
{code}
if add enableFSCachingGetSpace as you say, there are two switches that give the 
user more confusion.

3.
{quote}Even though deepCopyReplica is only used by another thread, I still 
prefer to let it be an atomic operation incase this will be used in other 
places in the future. Can you add datasetock here?
{quote}
FsDatasetImpl#addBlockPool with datasetLock ->FsVolumeList#addBlockPool
{code:java}
@Override
public void addBlockPool(String bpid, Configuration conf)
    throws IOException {
  LOG.info("Adding block pool " + bpid);
  try (AutoCloseableLock lock = datasetLock.acquire()) {
    volumes.addBlockPool(bpid, conf);
    volumeMap.initBlockPool(bpid);
  }
  volumes.getAllVolumesMap(bpid, volumeMap, ramDiskReplicaTracker);
}
{code}
FsVolumeList#addBlockPool ->FsVolumeImpl#addBlockPool -> new BlockPoolSlice 
->FsDatasetImpl#deepCopyReplica. If deepCopyReplica use datasetock, it appears 
deadlock. So use Collections.unmodifiableSet to make replica info is not 
allowed to be modified outside
{code:java}
void addBlockPool(final String bpid, final Configuration conf) throws 
IOException {
  long totalStartTime = Time.monotonicNow();
  final Map<FsVolumeSpi, IOException> unhealthyDataDirs =
      new ConcurrentHashMap<FsVolumeSpi, IOException>();
  List<Thread> blockPoolAddingThreads = new ArrayList<Thread>();
  for (final FsVolumeImpl v : volumes) {
    Thread t = new Thread() {
      public void run() {
        try (FsVolumeReference ref = v.obtainReference()) {
          FsDatasetImpl.LOG.info("Scanning block pool " + bpid +
              " on volume " + v + "...");
          long startTime = Time.monotonicNow();
          v.addBlockPool(bpid, conf);
          long timeTaken = Time.monotonicNow() - startTime;
          FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid +
              " on " + v + ": " + timeTaken + "ms");
        } catch (ClosedChannelException e) {
          // ignore.
        } catch (IOException ioe) {
          FsDatasetImpl.LOG.info("Caught exception while scanning " + v +
              ". Will throw later.", ioe);
          unhealthyDataDirs.put(v, ioe);
        }
      }
    };
    blockPoolAddingThreads.add(t);
    t.start();
  }
{code}
 4. According to your suggestion, I will modify UT for using  real minicluster 
and adding a comparison test by respectively using FSCachingGetUsed and default 
Du way.

Please correct me if I was wrong. Thank [~linyiqun] again.

 

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-14313
>                 URL: https://issues.apache.org/jira/browse/HDFS-14313
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, performance
>    Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>            Reporter: Lisheng Sun
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

Reply via email to