[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

Lisheng Sun (JIRA) Mon, 05 Aug 2019 20:10:12 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16900563#comment-16900563
 ]


Lisheng Sun commented on HDFS-14313:
------------------------------------

Thank [~jojochuang] for your suggestion.
  
{quote}One thing that's missing out in the v11 is the update to 
hdfs-default.xml which was missing since v8.
{quote}

 According to [~linyiqun] suggetion, I define a hard-coded threadold time value 
like 1000ms in ReplicaCachingGetSpaceUsed. So remove config in hdfs-default.xml.
{code:java}
private static final long DEEP_COPY_REPLICA_THRESHOLD_MS = 50;
private static final long REPLICA_CACHING_GET_SPACE_USED_THRESHOLD_MS = 1000;
{code}

{quote}
Additionally there should be an additional config key 
"fs.getspaceused.classname" in core-default.xml, and state that possible 
options are

org.apache.hadoop.fs.DU (default)
org.apache.hadoop.fs.WindowsGetSpaceUsed
org.apache.hadoop.fs.DFCachingGetSpaceUsed
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
{quote}

That ReplicaCachingGetSpaceUsed of hdfs module is added in core-default.xml of 
common module should not be very good. Please correct me if I was wrong. Thank 
you again.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-14313
>                 URL: https://issues.apache.org/jira/browse/HDFS-14313
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, performance
>    Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>            Reporter: Lisheng Sun
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch, 
> HDFS-14313.011.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

Reply via email to