[
https://issues.apache.org/jira/browse/HDFS-10797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15973105#comment-15973105
]
Nathan Roberts commented on HDFS-10797:
---------------------------------------
After deploying this to a cluster with a few hundred nodes, we have discovered
that this jira has caused significant memory bloat in the namenode. Filed 2.8.1
blocker for this issue - HDFS-11661.
> Disk usage summary of snapshots causes renamed blocks to get counted twice
> --------------------------------------------------------------------------
>
> Key: HDFS-10797
> URL: https://issues.apache.org/jira/browse/HDFS-10797
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: snapshots
> Affects Versions: 2.8.0
> Reporter: Sean Mackrory
> Assignee: Sean Mackrory
> Fix For: 2.8.0, 3.0.0-alpha2
>
> Attachments: HDFS-10797.001.patch, HDFS-10797.002.patch,
> HDFS-10797.003.patch, HDFS-10797.004.patch, HDFS-10797.005.patch,
> HDFS-10797.006.patch, HDFS-10797.007.patch, HDFS-10797.008.patch,
> HDFS-10797.009.patch, HDFS-10797.010.patch, HDFS-10797.010.patch
>
>
> DirectoryWithSnapshotFeature.computeContentSummary4Snapshot calculates how
> much disk usage is used by a snapshot by tallying up the files in the
> snapshot that have since been deleted (that way it won't overlap with regular
> files whose disk usage is computed separately). However that is determined
> from a diff that shows moved (to Trash or otherwise) or renamed files as a
> deletion and a creation operation that may overlap with the list of blocks.
> Only the deletion operation is taken into consideration, and this causes
> those blocks to get represented twice in the disk usage tallying.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]