[ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897804#comment-16897804
 ] 

Yiqun Lin commented on HDFS-14313:
----------------------------------

Almost loogs good now, some minor comments from me:
{noformat}
The deepCopyReplica call does't use the datasetock s
{noformat}
One typo , does't --> doesn't
{noformat}
setting set fs.getspaceused.classname
{noformat}
please remove the redundant set and update setting to Setting.
{noformat}
"blockPoolId: {}, replicas size: {}, copy replicas duration: {}ms"
{noformat}
Can you update to
{noformat}
Copy replica infos, blockPoolId: {}, replicas size: {}, duration: {}ms"
{noformat}
Update refresh to Refresh.
{noformat}
fs.getClient().delete("/testReplicaCachingGetSpaceUsed", true);
{noformat}
We can directly call the filesystem api to delete file.
{noformat}
fs.delete(new Path("/testReplicaCachingGetSpaceUsed"), true);
{noformat}
{quote}Get space used by DU impl must be greater than by 
ReplicaCachingGetSpaceUsed impl. get space used by ReplicaCachingGetSpaceUsed 
impl is more accurate. so is it necessary to add comparison for the DU impl 
class?
{quote}
You have raised up a good point the ReplicaCachingGetSpaceUsed way will only 
calculate the finalized blocks while du command way includes more files. Can 
you comment this important difference in the javadoc comment of class 
ReplicaCachingGetSpaceUsed? We should let others know which files this class 
will calculate for.

Yes, the calculation way is different now. Can you add an additional test to 
test with the case that some blocks files are not finalized, for example being 
rbw state? And then we check if the dfsused is correctly be updated.

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-14313
>                 URL: https://issues.apache.org/jira/browse/HDFS-14313
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, performance
>    Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>            Reporter: Lisheng Sun
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch, HDFS-14313.009.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to