[ https://issues.apache.org/jira/browse/HDFS-14313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16889667#comment-16889667 ]
Lisheng Sun commented on HDFS-14313: ------------------------------------ Thank [~linyiqun] for your review carefully. I have a few questions to discuss with you. 1 I don't use the component specific impl class in the common module. to update in GetSpaceUsed of common model is just to make subclasses inheritable. And that to update in CommonConfigurationKeys of common model is to print threshold time,which should be moved to DFSConfigKeys and is more appropriate. 2. Now there are switches that are used in control. follow as GetSpaceUsed#Builder#CLASSNAME_KEY {code:java} // static final String CLASSNAME_KEY = "fs.getspaceused.classname"; {code} if add enableFSCachingGetSpace as you say, there are two switches that give the user more confusion. 3. {quote}Even though deepCopyReplica is only used by another thread, I still prefer to let it be an atomic operation incase this will be used in other places in the future. Can you add datasetock here? {quote} FsDatasetImpl#addBlockPool with datasetLock ->FsVolumeList#addBlockPool {code:java} @Override public void addBlockPool(String bpid, Configuration conf) throws IOException { LOG.info("Adding block pool " + bpid); try (AutoCloseableLock lock = datasetLock.acquire()) { volumes.addBlockPool(bpid, conf); volumeMap.initBlockPool(bpid); } volumes.getAllVolumesMap(bpid, volumeMap, ramDiskReplicaTracker); } {code} FsVolumeList#addBlockPool ->FsVolumeImpl#addBlockPool -> new BlockPoolSlice ->FsDatasetImpl#deepCopyReplica. If deepCopyReplica use datasetock, it appears deadlock. So use Collections.unmodifiableSet to make replica info is not allowed to be modified outside {code:java} void addBlockPool(final String bpid, final Configuration conf) throws IOException { long totalStartTime = Time.monotonicNow(); final Map<FsVolumeSpi, IOException> unhealthyDataDirs = new ConcurrentHashMap<FsVolumeSpi, IOException>(); List<Thread> blockPoolAddingThreads = new ArrayList<Thread>(); for (final FsVolumeImpl v : volumes) { Thread t = new Thread() { public void run() { try (FsVolumeReference ref = v.obtainReference()) { FsDatasetImpl.LOG.info("Scanning block pool " + bpid + " on volume " + v + "..."); long startTime = Time.monotonicNow(); v.addBlockPool(bpid, conf); long timeTaken = Time.monotonicNow() - startTime; FsDatasetImpl.LOG.info("Time taken to scan block pool " + bpid + " on " + v + ": " + timeTaken + "ms"); } catch (ClosedChannelException e) { // ignore. } catch (IOException ioe) { FsDatasetImpl.LOG.info("Caught exception while scanning " + v + ". Will throw later.", ioe); unhealthyDataDirs.put(v, ioe); } } }; blockPoolAddingThreads.add(t); t.start(); } {code} 4. According to your suggestion, I will modify UT for using real minicluster and adding a comparison test by respectively using FSCachingGetUsed and default Du way. Please correct me if I was wrong. Thank [~linyiqun] again. > Get hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfo in memory > instead of df/du > ---------------------------------------------------------------------------------------- > > Key: HDFS-14313 > URL: https://issues.apache.org/jira/browse/HDFS-14313 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode, performance > Affects Versions: 2.6.0, 2.7.0, 2.8.0, 2.9.0, 3.0.0, 3.1.0 > Reporter: Lisheng Sun > Assignee: Lisheng Sun > Priority: Major > Attachments: HDFS-14313.000.patch, HDFS-14313.001.patch, > HDFS-14313.002.patch, HDFS-14313.003.patch, HDFS-14313.004.patch, > HDFS-14313.005.patch, HDFS-14313.006.patch, HDFS-14313.007.patch > > > There are two ways of DU/DF getting used space that are insufficient. > # Running DU across lots of disks is very expensive and running all of the > processes at the same time creates a noticeable IO spike. > # Running DF is inaccurate when the disk sharing by multiple datanode or > other servers. > Getting hdfs used space from FsDatasetImpl#volumeMap#ReplicaInfos in memory > is very small and accurate. -- This message was sent by Atlassian JIRA (v7.6.14#76016) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org