[ https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16894619#comment-16894619 ]
Yiqun Lin commented on HDFS-14313: ---------------------------------- Thanks for updating the patch, [~leosun08]. Some review comments for the patch: *FSCachingGetSpaceUsed* Line 35: Can we update the annotation to {{@InterfaceAudience.Private}} since this is only specified for HDFS module.? Line 38: Remove the public keyword for LOG instance. Line 53: Add the final keyword for the FsVolumeImpl variable. Line 54: Add the final keyword too. Line 75: We don't pass the config to use the threshold time now, still we need to override this method? If don't need, the change made in GetSpaceUsed can also be reverted. *FsDatasetImpl* {noformat} FsVolumeList#addBlockPool -> FsVolumeImpl#addBlockPool -> new BlockPoolSlice -> FsDatasetImpl#deepCopyReplica. If deepCopyReplica use datasetock, it appears deadlock. {noformat} This comment can simplified to 'The deepCopyReplica call does't use the datasetock since it will lead the potential deadlock with the {@link FsVolumeList#addBlockPool} call.' *ReplicaCachingGetSpaceUsed* Line 41: {noformat} To use set fs.getspaceused.classname to org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed in your core-site.xml. {noformat} can update to a more readable way: {noformat} Setting fs.getspaceused.classname to org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.ReplicaCachingGetSpaceUsed in core-site.xml if we want to enable this class. {noformat} Line 46: Update the annotation to {{@InterfaceAudience.Private}}. Line 49: Remove the public keyword. Line 51: Add the final keyword. Line 52: Add the final keyword. Line76: I would prefer to define a static final variable to hard-coded the value in ReplicaCachingGetSpaceUsed. Line 77: Can we use the parameter way to print the log? Like LOG.debug("BlockPoolId: {}, replicas size: {}, copy replicas duration: {}ms.", obj1, obj2...} Line 95: The same comment for Line 76. The 96:The same comment for Line 77. Line 101: Update replicaCachingGetSpaceUsed to ReplicaCachingGetSpaceUsed. *TestReplicaCachingGetSpaceUsed* Line 42: Please add the definition comment for this class. Line 51: We can use Configuration#setClass call to set the class impl {noformat} conf.setClass("fs.getspaceused.classname", ReplicaCachingGetSpaceUsed.class, CachingGetSpaceUsed.class); {noformat} Line 61: It will be a good behavior to clean the hdfs path we created for the test. Line 69: As I have mentioned before, can we have an additional comparison for the DU impl class? The most of lines can be reused for these two getused impl class. Just passing different key value with restart the mini cluster and comparing the used space. > Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory > instead of df/du > ---------------------------------------------------------------------------------------- > > Key: HDFS-14313 > URL: https://issues.apache.org/jira/browse/HDFS-14313 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, performance > Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 > Reporter: Lisheng Sun > Assignee: Lisheng Sun > Priority: Major > Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, > HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, > HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch, > HDFS-14313.008.patch > > > There are two ways of DU/DF getting used space that are insufficient. > # Running DU across lots of disks is very expensive and running all of the > processes at the same time creates a noticeable IO spike. > # Running DF is inaccurate when the disk sharing by multiple datanode or > other servers. > Getting hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfos in memory > is very small and accurate. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org