[ https://issues.apache.org/jira/browse/HDFS-15548?focusedWorklogId=490431&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-490431 ]
ASF GitHub Bot logged work on HDFS-15548: ----------------------------------------- Author: ASF GitHub Bot Created on: 24/Sep/20 21:55 Start Date: 24/Sep/20 21:55 Worklog Time Spent: 10m Work Description: LeonGao91 commented on a change in pull request #2288: URL: https://github.com/apache/hadoop/pull/2288#discussion_r494632863 ########## File path: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java ########## @@ -412,16 +435,28 @@ long getBlockPoolUsed(String bpid) throws IOException { */ @VisibleForTesting public long getCapacity() { + long capacity; if (configuredCapacity < 0L) { long remaining; if (cachedCapacity > 0L) { remaining = cachedCapacity - getReserved(); } else { remaining = usage.getCapacity() - getReserved(); } - return Math.max(remaining, 0L); + capacity = Math.max(remaining, 0L); + } else { + capacity = configuredCapacity; + } + + if (enableSameDiskArchival) { Review comment: This is actually the important part to enable this feature, to allow users to configure the capacity of a fsVolume. As we are configuring two fsVolume on the same underlying filesystem, if we do nothing the capacity will be calculated twice thus all the stats being reported will be incorrect. Here is an example: Let's say we want to configure `[DISK]/data01/dfs` and `[ARCHIVE]/data01/dfs_archive` on a 4TB disk mount `/data01`, and we want to assign 1 TB to `[DISK]/data01/dfs` and 3 TB for `[ARCHIVE]/data01/dfs_archive`, we can make `reservedForArchive` to be 0.75 and put those two dirs in the volume list. In this case, `/data01/dfs` will be reported as a 1TB volume and `/data01/dfs_archive` will be reported as 3TB volume to HDFS. Logically, HDFS will just treat them as two separate volumes. If we don't make the change here, HDFS will see two volumes and each of them is 4TB, in that case, the 4TB disk will be counted as 4 * 2 = 8TB capacity in namenode and all the related stats will be wrong. Another change we need to make is the `getActualNonDfsUsed()` as below. Let's say in the above 4TB disk setup we use 0.1TB as reserved, and `[ARCHIVE]/data01/dfs_archive` already has 2TB capacity used, in this case when we are calculating the `getActualNonDfsUsed()` for `[DISK]/data01/dfs` it will always return 0, which is not correct and it will cause other weird issues. As the two fsVolumes are on the same filesystem, the reserved space should be shared. According to our analysis and cluster testing result, updating these two functions `getCapacity()` and `getActualNonDfsUsed()` is enough to keep stats correct for the two "logical" fsVolumes on same disk. I can update the java doc to reflect this when the feature is turned on. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 490431) Time Spent: 4.5h (was: 4h 20m) > Allow configuring DISK/ARCHIVE storage types on same device mount > ----------------------------------------------------------------- > > Key: HDFS-15548 > URL: https://issues.apache.org/jira/browse/HDFS-15548 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: datanode > Reporter: Leon Gao > Assignee: Leon Gao > Priority: Major > Labels: pull-request-available > Time Spent: 4.5h > Remaining Estimate: 0h > > We can allow configuring DISK/ARCHIVE storage types on the same device mount > on two separate directories. > Users should be able to configure the capacity for each. Also, the datanode > usage report should report stats correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org