[jira] [Comment Edited] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

Lisheng Sun (JIRA) Fri, 21 Jun 2019 02:25:53 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869322#comment-16869322
 ]


Lisheng Sun edited comment on HDFS-14313 at 6/21/19 9:24 AM:
-------------------------------------------------------------

[~jojochuang] Thans for your review.

{{{quote}}}

{{Additionally, what's the memory consumption look like? I assume it doubles 
DataNode memory usage.}}

{{{quote}}}{{}}

{{It cost memory far less than twice DataNode memory usage. }}
{code:java}
Collection<ReplicaInfo> replicaInfos =
    (Collection<ReplicaInfo>) fsDataset.deepCopyReplica(bpid);
{code}
{{Because }}ReplicaCachingGetSpaceUsed#replicaInfos is local variable and used 
for every block pool， and it also increases reference count. 

Increased memory consumption is  about 4 bytes（8 bytes） * ReplicaInfos.

{quote}

synchronization: I find it hard to believe that FsDatasetImpl#deepCopyReplica() 
is not synchronized to avoid data race.

{quote}

I can't fully understand this synchronization problem. You worry about data 
race happen?

 


was (Author: leosun08):
[~jojochuang] Thans for your review.

{{{quote}}}

{{Additionally, what's the memory consumption look like? I assume it doubles 
DataNode memory usage.}}

{{{quote}}}{{}}

{{It cost memory far less than twice DataNode memory usage. }}
{code:java}
Collection<ReplicaInfo> replicaInfos =
    (Collection<ReplicaInfo>) fsDataset.deepCopyReplica(bpid);
{code}
{{Because }}ReplicaCachingGetSpaceUsed#replicaInfos is local variable and used 
for every block pool， and it also increases reference count. 

Increased memory consumption is  about 4 bytes（8 bytes） * ReplicaInfos.

{quote}

synchronization: I find it hard to believe that FsDatasetImpl#deepCopyReplica() 
is not synchronized to avoid data race.

{quote}

I can't fully understand this synchronization problem. You worry about data 
race happen?

 

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-14313
>                 URL: https://issues.apache.org/jira/browse/HDFS-14313
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, performance
>    Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>            Reporter: Lisheng Sun
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

Reply via email to