[
https://issues.apache.org/jira/browse/HDFS-8045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
zhouyingchao updated HDFS-8045:
-------------------------------
Description:
After reserve some space via the param "dfs.datanode.du.reserved", we noticed
that the namenode usually report NonDfsUsed of Datanodes as 0 even if we write
some non-hdfs data to the volume. After some investigation, we think there is
an issue in the calculation of FsVolumeImpl.getAvailable - following is the
explaination.
For a volume, let's use Raw to represent raw capacity, DfsUsed to represent
space consumed by hdfs blocks, Reserved to represent reservation through
"dfs.datanode.du.reserved", RbwReserved to represent space reservation for rbw
blocks, RealNonDfsUsed to represent real value of NonDfsUsed(which will include
non-hdfs files and meta data consumed by local filesystem).
In current implementation, for a volume, available space will be actually
calculated as
{code}
min{Raw - Reserved - DfsUsed -RbwReserved, Raw - DfsUsed - RealNonDfsUsed }
{code}
Later on, Namenode will calculate NonDfsUsed of the volume as
{code}
Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw -
DfsUsed - RealNonDfsUsed}
{code}
Given the calculation, finally we will have -
{code}
if (Reserved + RbwReserved > RealNonDfsUsed) NonDfsUsed = RbwReserved;
else NonDfsUsed = RealNonDfsUsed - Reserved;
{code}
Either way it is far from the correct value.
After investigating the implementation, we believe the Reserved and RbwReserved
should be subtract from available in getAvailable since they are actually not
available to hdfs in any sense. I'll post a patch soon.
was:
After reserve some space via the param "dfs.datanode.du.reserved", we noticed
that the namenode usually report NonDfsUsed of Datanodes as 0 even if we write
some non-hdfs data to the volume. After some investigation, we think there is
an issue in the calculation of FsVolumeImpl.getAvailable - following is the
explaination.
For a volume, let's use Raw to represent raw capacity, DfsUsed to represent
space consumed by hdfs blocks, Reserved to represent reservation through
"dfs.datanode.du.reserved", RbwReserved to represent space reservation for rbw
blocks, NDfsUsed to represent real value of NonDfsUsed(which will include
non-hdfs files and meta data consumed by local filesystem).
In current implementation, for a volume, available space will be actually
calculated as
{code}
min{Raw - Reserved - DfsUsed -RbwReserved, Raw - DfsUsed - NDfsUsed }
{code}
Later on, Namenode will calculate NonDfsUsed of the volume as
{code}
Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw -
DfsUsed - NDfsUsed}
{code}
Given the calculation, finally we will have -
{code}
if (Reserved + RbwReserved > NDfsUsed) NonDfsUsed = RbwReserved;
else NonDfsUsed = NDfsUsed - Reserved;
{code}
Either way it is far from a correct value.
After investigation the implementation, we believe the Reserved and RbwReserved
should be subtract from available in getAvailable since they are actually not
available to hdfs in any way. I'll post a patch soon.
> Incorrect calculation of NonDfsUsed and Remaining
> -------------------------------------------------
>
> Key: HDFS-8045
> URL: https://issues.apache.org/jira/browse/HDFS-8045
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.6.0
> Reporter: zhouyingchao
> Assignee: zhouyingchao
> Attachments: HDFS-8045-001.patch
>
>
> After reserve some space via the param "dfs.datanode.du.reserved", we noticed
> that the namenode usually report NonDfsUsed of Datanodes as 0 even if we
> write some non-hdfs data to the volume. After some investigation, we think
> there is an issue in the calculation of FsVolumeImpl.getAvailable - following
> is the explaination.
> For a volume, let's use Raw to represent raw capacity, DfsUsed to represent
> space consumed by hdfs blocks, Reserved to represent reservation through
> "dfs.datanode.du.reserved", RbwReserved to represent space reservation for
> rbw blocks, RealNonDfsUsed to represent real value of NonDfsUsed(which will
> include non-hdfs files and meta data consumed by local filesystem).
> In current implementation, for a volume, available space will be actually
> calculated as
> {code}
> min{Raw - Reserved - DfsUsed -RbwReserved, Raw - DfsUsed - RealNonDfsUsed }
> {code}
> Later on, Namenode will calculate NonDfsUsed of the volume as
> {code}
> Raw - Reserved - DfsUsed - min{Raw - Reserved - DfsUsed - RbwReserved, Raw -
> DfsUsed - RealNonDfsUsed}
> {code}
> Given the calculation, finally we will have -
> {code}
> if (Reserved + RbwReserved > RealNonDfsUsed) NonDfsUsed = RbwReserved;
> else NonDfsUsed = RealNonDfsUsed - Reserved;
> {code}
> Either way it is far from the correct value.
> After investigating the implementation, we believe the Reserved and
> RbwReserved should be subtract from available in getAvailable since they are
> actually not available to hdfs in any sense. I'll post a patch soon.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)