[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16899932#comment-16899932
 ] 

Yiqun Lin edited comment on HDFS-14313 at 8/5/19 9:23 AM:
----------------------------------------------------------

Two review comments:

I'd like to decorate the javadoc comment to a more readable way.
 From
{noformat}
Fast and accurate class to tell how much space HDFS is using. This class get
hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfos in memory for HDFS.

Get hdfs used space by ReplicaCachingGetSpaceUsed impl only includes block
and meta, but get space used by DU impl includes include all directories
size, other files such as VERSION, in_use.lock and so on. Get space used by
DU impl must be greater than by ReplicaCachingGetSpaceUsed impl. Get space
used by ReplicaCachingGetSpaceUsed impl is more accurate.

setting fs.getspaceused.classname to
org.apache.hadoop.hdfs.server.datanode.fsdataset
impl.ReplicaCachingGetSpaceUsed in your core-site.xml if we want to enable
this class.
{noformat}
To 
{noformat}
Fast and accurate class to tell how much space HDFS is using. This class gets
hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfos that uses an in
memory way.

Getting hdfs used space by ReplicaCachingGetSpaceUsed impl only includes block
and meta files, but DU impl is blockpool dir based statistics that will
include additional files, e.g. tmp dir, scanner.cursor file. Getting space used 
by
DU impl will be greater than by ReplicaCachingGetSpaceUsed impl, but the latter 
is more accurate.

Setting fs.getspaceused.classname to
org.apache.hadoop.hdfs.server.datanode.fsdataset
impl.ReplicaCachingGetSpaceUsed in your core-site.xml if we want to enable
this class.
{noformat}

{noformat}
os.close();
fs.delete(new Path("/testReplicaCachingGetSpaceUsedByRBWReplica"), true);
{noformat}
Can we add the asset operation again after close operation?  After close 
operation, the replica state will be transformed from RBW to finalized. But the 
space used of these replicas  are all included and the dfsUsed value should be 
same.
{noformat}
os.close();
assertEquals(blockLength + metaLength, dataNode.getFSDataset().getDfsUsed());
fs.delete(new Path("/testReplicaCachingGetSpaceUsedByRBWReplica"), true);
{noformat}

BTW, please fix the checkstyle issue,


was (Author: linyiqun):
Two review comments:

I'd like to decorate the javadoc comment to a more readable way.
 From
{noformat}
Fast and accurate class to tell how much space HDFS is using. This class get
hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfos in memory for HDFS.

Get hdfs used space by ReplicaCachingGetSpaceUsed impl only includes block
and meta, but get space used by DU impl includes include all directories
size, other files such as VERSION, in_use.lock and so on. Get space used by
DU impl must be greater than by ReplicaCachingGetSpaceUsed impl. Get space
used by ReplicaCachingGetSpaceUsed impl is more accurate.

setting fs.getspaceused.classname to
org.apache.hadoop.hdfs.server.datanode.fsdataset
impl.ReplicaCachingGetSpaceUsed in your core-site.xml if we want to enable
this class.
{noformat}
To 
{noformat}
Fast and accurate class to tell how much space HDFS is using. This class gets
hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfos that uses an in
memory way.

Getting hdfs used space by ReplicaCachingGetSpaceUsed impl only includes block
and meta files, but DU impl is blockpool dir based statistics that will
include additional files, e.g. tmp dir, scanner.cursor file. Getting space used 
by
DU impl will be greater than by ReplicaCachingGetSpaceUsed impl, but the latter 
is more accurate.

Setting fs.getspaceused.classname to
org.apache.hadoop.hdfs.server.datanode.fsdataset
impl.ReplicaCachingGetSpaceUsed in your core-site.xml if we want to enable
this class.
{noformat}

{noformat}
os.close();
fs.delete(new Path("/testReplicaCachingGetSpaceUsedByRBWReplica"), true);
{noformat}
Can we add the asset operation again after close operation?  After close 
operation, the replica state will be transformed from RBW to finalized. But the 
space used of these replicas  are all included and the dfsUsed value should be 
same.
{noformat}
os.close();
assertEquals(blockLength + metaLength, dataNode.getFSDataset().getDfsUsed());
fs.delete(new Path("/testReplicaCachingGetSpaceUsedByRBWReplica"), true);
{noformat}

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-14313
>                 URL: https://issues.apache.org/jira/browse/HDFS-14313
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, performance
>    Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>            Reporter: Lisheng Sun
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch, HDFS-14313.010.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to