[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

Yiqun Lin (JIRA) Sat, 27 Jul 2019 21:44:00 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894619#comment-16894619
 ]


Yiqun Lin commented on HDFS-14313:
----------------------------------

Thanks for updating the patch, [~leosun08]. Some review comments for the patch:

*FSCachingGetSpaceUsed*
 Line 35: Can we update the annotation to {{@InterfaceAudience.Private}} since 
this is only specified for HDFS module.?
 Line 38: Remove the public keyword for LOG instance.
 Line 53: Add the final keyword for the FsVolumeImpl variable.
 Line 54: Add the final keyword too.
 Line 75: We don't pass the config to use the threshold time now, still we need 
to override this method? If don't need, the change made in
 GetSpaceUsed can also be reverted.

*FsDatasetImpl*
{noformat}
FsVolumeList#addBlockPool -> FsVolumeImpl#addBlockPool -> new BlockPoolSlice -> 
FsDatasetImpl#deepCopyReplica. If deepCopyReplica use datasetock, it appears 
deadlock.
{noformat}
This comment can simplified to 'The deepCopyReplica call does't use the 
datasetock since it will lead the potential deadlock with the

{@link FsVolumeList#addBlockPool} call.'

*ReplicaCachingGetSpaceUsed*
 Line 41:
{noformat}
To use set fs.getspaceused.classname to 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
 in your core-site.xml.
{noformat}
can update to a more readable way:
{noformat}
Setting fs.getspaceused.classname to 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed
 in core-site.xml if we want to enable this class.
{noformat}
Line 46: Update the annotation to {{@InterfaceAudience.Private}}.
 Line 49: Remove the public keyword.
 Line 51: Add the final keyword.
 Line 52: Add the final keyword.
 Line76: I would prefer to define a static final variable to hard-coded the 
value in ReplicaCachingGetSpaceUsed.
 Line 77: Can we use the parameter way to print the log? Like 
LOG.debug("BlockPoolId: {}, replicas size: {}, copy replicas duration: {}ms.", 
obj1, obj2...}
 Line 95: The same comment for Line 76.
 The 96:The same comment for Line 77.
 Line 101: Update replicaCachingGetSpaceUsed to ReplicaCachingGetSpaceUsed.

*TestReplicaCachingGetSpaceUsed*
 Line 42: Please add the definition comment for this class.
 Line 51: We can use Configuration#setClass call to set the class impl
{noformat}
conf.setClass("fs.getspaceused.classname", ReplicaCachingGetSpaceUsed.class, 
CachingGetSpaceUsed.class);
{noformat}
Line 61: It will be a good behavior to clean the hdfs path we created for the 
test.
 Line 69: As I have mentioned before, can we have an additional comparison for 
the DU impl class? The most of lines can be reused for these two getused impl 
class. Just passing different key value with restart the mini cluster and 
comparing the used space.

 

> Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory  
> instead of df/du
> ----------------------------------------------------------------------------------------
>
>                 Key: HDFS-14313
>                 URL: https://issues.apache.org/jira/browse/HDFS-14313
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode, performance
>    Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0
>            Reporter: Lisheng Sun
>            Assignee: Lisheng Sun
>            Priority: Major
>         Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, 
> HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, 
> HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, 
> HDFS-14313.008.patch
>
>
> There are two ways of DU/DF getting used space that are insufficient.
>  #  Running DU across lots of disks is very expensive and running all of the 
> processes at the same time creates a noticeable IO spike.
>  #  Running DF is inaccurate when the disk sharing by multiple datanode or 
> other servers.
>  Getting hdfs used space from  FsDatasetImpl#volumeMap#ReplicaInfos in memory 
> is very small and accurate. 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-14313) Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory instead of df/du

Reply via email to