[jira] [Comment Edited] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

Yiqun Lin (JIRA) Tue, 23 Jul 2019 08:27:25 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16891132#comment-16891132
 ]


Yiqun Lin edited comment on HDFS-14313 at 7/23/19 3:26 PM:
-----------------------------------------------------------

Hi [~leosun08],
1.
{quote}
I don't use the component specific impl class in the common module. to update 
in GetSpaceUsed of common model is just to make subclasses inheritable. And 
that to update in CommonConfigurationKeys of common model is to print threshold 
time，which should be moved to DFSConfigKeys and is more appropriate..
{quote}
I don't think it's necessary to have two new config to make the threshold time 
configurable, the hard-coded way should be enough. I mean we can define a 
hard-coded threadold time value like 1000ms in ReplicaCachingGetSpaceUsed. So 
that we don't need to do any change in Common.

2. Can you add comment for the potential dead lock issue in the method 
deepCopyReplica? That will let others know the context. 

Except above two points, others make sense to me.



was (Author: linyiqun):
Hi [~leosun08],
1.
{quote}
I don't use the component specific impl class in the common module. to update 
in GetSpaceUsed of common model is just to make subclasses inheritable. And 
that to update in CommonConfigurationKeys of common model is to print threshold 
time，which should be moved to DFSConfigKeys and is more appropriate..
{quote}
I don't think it's necessary to have two new config to make the threshold time 
configurable, the hard-coded way should be enough. I mean we can define a 
hard-coded threadold time value like 1000ms in ReplicaCachingGetSpaceUsed. So 
that we don't need to do any change in Common.

2. Can you add comment for the potential dead lock issue in the method 
deepCopyReplica? That will let others know the context. 

Besides above two points, others make sense to me.


> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-14313
>                 URL: https://issues.apache.org/jira/browse/HDFS-14313
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, performance
>    Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>            Reporter: Lisheng Sun
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

Reply via email to