[jira] [Work logged] (HDFS-15548) Allow configuring DISK/ARCHIVE storage types on same device mount

ASF GitHub Bot (Jira) Thu, 24 Sep 2020 14:56:08 -0700


     [ 
https://issues.apache.org/jira/browse/HDFS-15548?focusedWorklogId=490431&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-490431
 ]


ASF GitHub Bot logged work on HDFS-15548:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 24/Sep/20 21:55
            Start Date: 24/Sep/20 21:55
    Worklog Time Spent: 10m 
      Work Description: LeonGao91 commented on a change in pull request #2288:
URL: https://github.com/apache/hadoop/pull/2288#discussion_r494632863



##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/fsdataset/impl/FsVolumeImpl.java
##########
@@ -412,16 +435,28 @@ long getBlockPoolUsed(String bpid) throws IOException {
    */
   @VisibleForTesting
   public long getCapacity() {
+    long capacity;
     if (configuredCapacity < 0L) {
       long remaining;
       if (cachedCapacity > 0L) {
         remaining = cachedCapacity - getReserved();
       } else {
         remaining = usage.getCapacity() - getReserved();
       }
-      return Math.max(remaining, 0L);
+      capacity = Math.max(remaining, 0L);
+    } else {
+      capacity = configuredCapacity;
+    }
+
+    if (enableSameDiskArchival) {

Review comment:
       This is actually the important part to enable this feature, to allow 
users to configure the capacity of a fsVolume.
   
   As we are configuring two fsVolume on the same underlying filesystem, if we 
do nothing the capacity will be calculated twice thus all the stats being 
reported will be incorrect.
   
   Here is an example:
   Let's say we want to configure `[DISK]/data01/dfs` and 
`[ARCHIVE]/data01/dfs_archive` on a 4TB disk mount `/data01`, and we want to 
assign 1 TB to `[DISK]/data01/dfs` and 3 TB for `[ARCHIVE]/data01/dfs_archive`, 
we can make `reservedForArchive` to be 0.75 and put those two dirs in the 
volume list.
   
   In this case, `/data01/dfs` will be reported as a 1TB volume and 
`/data01/dfs_archive` will be reported as 3TB volume to HDFS. Logically, HDFS 
will just treat them as two separate volumes.
   
   If we don't make the change here, HDFS will see two volumes and each of them 
is 4TB, in that case, the 4TB disk will be counted as 4 * 2 = 8TB capacity in 
namenode and all the related stats will be wrong.
   
   Another change we need to make is the `getActualNonDfsUsed()` as below. 
Let's say in the above 4TB disk setup we use 0.1TB as reserved, and 
`[ARCHIVE]/data01/dfs_archive` already has 2TB capacity used, in this case when 
we are calculating the `getActualNonDfsUsed()` for `[DISK]/data01/dfs` it will 
always return 0, which is not correct and it will cause other weird issues. As 
the two fsVolumes are on the same filesystem, the reserved space should be 
shared.
   
   According to our analysis and cluster testing result, updating these two 
functions `getCapacity()` and `getActualNonDfsUsed()` is enough to keep stats 
correct for the two "logical" fsVolumes on same disk.
   
   I can update the java doc to reflect this when the feature is turned on.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 490431)
    Time Spent: 4.5h  (was: 4h 20m)

> Allow configuring DISK/ARCHIVE storage types on same device mount
> -----------------------------------------------------------------
>
>                 Key: HDFS-15548
>                 URL: https://issues.apache.org/jira/browse/HDFS-15548
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>            Reporter: Leon Gao
>            Assignee: Leon Gao
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> We can allow configuring DISK/ARCHIVE storage types on the same device mount 
> on two separate directories.
> Users should be able to configure the capacity for each. Also, the datanode 
> usage report should report stats correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Work logged] (HDFS-15548) Allow configuring DISK/ARCHIVE storage types on same device mount

Reply via email to