zhouyingchao created HDFS-8045:
----------------------------------

             Summary: Incorrect calculation of NonDfsUsed and Remaining
                 Key: HDFS-8045
                 URL: https://issues.apache.org/jira/browse/HDFS-8045
             Project: Hadoop HDFS
          Issue Type: Bug
            Reporter: zhouyingchao
            Assignee: zhouyingchao
         Attachments: HDFS-8045-001.patch

After reserve some space via the param "dfs.datanode.du.reserved", we noticed 
that the namenode usually report NonDfsUsed of Datanodes as 0 even if we 
actually write some data to the volume. After some investigation, we think 
there is an issue in the calculation of FsVolumeImpl.getAvailable - following 
is the explaination.

For a volume, let's use Raw to represent raw capacity, DfsUsed to represent 
space consumed by hdfs blocks, Reserved to represent reservation through 
"dfs.datanode.du.reserved", RbwReserved to represent space reservation for rbw 
blocks, NDfsUsed to represent real value of NonDfsUsed(which will include 
non-hdfs files and meta data consumed by local filesystem).
In current implementation, for a volume, available space will be actually 
calculated as  min{Raw - Reserved - DfsUsed -RbwReserved,  Raw - DfsUsed - 
NDfsUsed }. 
Later on, Namenode will calculate NonDfsUsed of the volume as "Raw - Reserved - 
DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw - DfsUsed - 
NDfsUsed}".

Given the calculation, finally we will have -
if "Reserved + RbwReserved > NDfsUsed", then the calculated NonDfsUsed will be 
RbwReserved. Otherwise if "Reserved + RbwReserved < NDfsUsed", then the 
calculated NonDfsUsed would be "NDfsUsed - Reserved". Either way it is far from 
a correct value.

After investigation the implementation, we believe the Reserved and RbwReserved 
should be subtract from available in getAvailable since they are actually not 
available to hdfs in any way.  I'll post a patch soon.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to