[ 
https://issues.apache.org/jira/browse/HDFS-9038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15053202#comment-15053202
 ] 

Chris Nauroth commented on HDFS-9038:
-------------------------------------

bq. I believe, post HDFS-5215, calculation turns out to be the correct one.

I disagree, because the post-HDFS-5215 calculation has broken previously 
established operations workflows.

Non-DFS usage is "space unexpectedly consumed on data volumes by things that 
are not HDFS".  As a cluster administrator, I would monitor the non-DFS usage 
value.  If a node consistently showed a non-zero non-DFS usage, then that would 
signal me to login to the box, figure out where the space was consumed, free it 
up, and then address root cause (probably a rogue process running on the wrong 
box or misconfigured to write to the wrong volume).  It's important to fix 
this, because high non-DFS usage reduces disk capacity that had been planned 
for HDFS.

After HDFS-5215, this workflow no longer works.  The monitoring will show false 
positives because of inclusion of {{dfs.datanode.du.reserved}}.  An 
administrator would need to use additional checks, or simply disable this 
monitoring due to the noise.

{{dfs.datanode.du.reserved}} is a special case on top of what I described 
above.  Setting a non-zero {{dfs.datanode.du.reserved}} means that the 
administrator intentionally wants to hold back a portion of the volume, 
essentially making it invisible to HDFS.  This is the same reason that 
{{dfs.datanode.du.reserved}} gets subtracted from the capacity calculation.  
This is why I think 3 GB is incorrect in the example above.  Non-DFS usage is 
really "unexpected non-DFS usage".  By setting {{dfs.datanode.du.reserved}} to 
1 GB, you have stated that up to 1 GB of usage by something other than HDFS is 
expected, so it's incorrect to count it as part of "unexpected non-DFS usage".

Bottom line: I have seen the post-HDFS-5215 calculation cause confusion for 
administrators who had built a workflow around the old calculation.

> DFS reserved space is erroneously counted towards non-DFS used.
> ---------------------------------------------------------------
>
>                 Key: HDFS-9038
>                 URL: https://issues.apache.org/jira/browse/HDFS-9038
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.7.1
>            Reporter: Chris Nauroth
>            Assignee: Brahma Reddy Battula
>         Attachments: HDFS-9038-002.patch, HDFS-9038-003.patch, 
> HDFS-9038-004.patch, HDFS-9038-005.patch, HDFS-9038-006.patch, 
> HDFS-9038-007.patch, HDFS-9038.patch
>
>
> HDFS-5215 changed the DataNode volume available space calculation to consider 
> the reserved space held by the {{dfs.datanode.du.reserved}} configuration 
> property.  As a side effect, reserved space is now counted towards non-DFS 
> used.  I don't believe it was intentional to change the definition of non-DFS 
> used.  This issue proposes restoring the prior behavior: do not count 
> reserved space towards non-DFS used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to