[
https://issues.apache.org/jira/browse/HDFS-11661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234342#comment-16234342
]
Xiao Chen commented on HDFS-11661:
----------------------------------
{quote}
There are more bugs related to snapshots and content summary and quota usage
discrepencies. I almost have a patch ready that optimizes content summary and
appears to fix the snapshot issues.
{quote}
Hi [~daryn] and [~shahrs87],
Just wanted to check if this was eventually done? And could you share the jira
if so?
Thanks!
> GetContentSummary uses excessive amounts of memory
> --------------------------------------------------
>
> Key: HDFS-11661
> URL: https://issues.apache.org/jira/browse/HDFS-11661
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.8.0, 3.0.0-alpha2
> Reporter: Nathan Roberts
> Assignee: Wei-Chiu Chuang
> Priority: Blocker
> Fix For: 2.9.0, 3.0.0-alpha4, 2.8.2
>
> Attachments: HDFS-11661.001.patch, HDFs-11661.002.patch, Heap
> growth.png
>
>
> ContentSummaryComputationContext::nodeIncluded() is being used to keep track
> of all INodes visited during the current content summary calculation. This
> can be all of the INodes in the filesystem, making for a VERY large hash
> table. This simply won't work on large filesystems.
> We noticed this after upgrading a namenode with ~100Million filesystem
> objects was spending significantly more time in GC. Fortunately this system
> had some memory breathing room, other clusters we have will not run with this
> additional demand on memory.
> This was added as part of HDFS-10797 as a way of keeping track of INodes that
> have already been accounted for - to avoid double counting.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]