[
https://issues.apache.org/jira/browse/HDFS-14492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wei-Chiu Chuang updated HDFS-14492:
-----------------------------------
Fix Version/s: 3.2.2
3.1.4
> Snapshot memory leak
> --------------------
>
> Key: HDFS-14492
> URL: https://issues.apache.org/jira/browse/HDFS-14492
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: snapshots
> Affects Versions: 2.6.0
> Environment: CDH5.14.4
> Reporter: Wei-Chiu Chuang
> Assignee: Wei-Chiu Chuang
> Priority: Major
> Fix For: 3.3.0, 3.1.4, 3.2.2
>
>
> We recently examined the NameNode heap dump of a big, heavy snapshot user,
> trying to trim some fat, and surely enough we found memory leak in it: when
> snapshots are removed, the corresponding data structures are not removed.
> This cluster has 586 million file system objects (286 million files, 287
> million blocks, 13 million directories), using around 132gb of heap.
> While only 44.5 million files have snapshotted copies,
> (INodeFileAttributes$SnapshotCopy), most inodes (nearly 212 million) have
> FileWithSnapshotFeature and FileDiffList. Those inodes had snapshotted copies
> at some point in the past, but after snapshots are removed, those data
> structured are still kept in the heap.
> INode$Feature = 32.5 byte on average, FileWithSnapshotFeature = 32 bytes,
> FileDiffList = 24 bytes. It may not sound a lot, but they add up quickly in
> large clusters like this. In this cluster, a whopping 13.8gb of memory could
> have been saved: ((32.5 + 32 + 24) bytes * (211997769 - 44572380) =~
> 13.8gb) if not for this bug. That is more than 10% of savings in heap size.
> Heap histogram for reference:
> {noformat}
> num #instances #bytes class name
> ----------------------------------------------
> 1: 286418254 27496152384 org.apache.hadoop.hdfs.server.namenode.INodeFile
> 2: 287322227 18388622528
> org.apache.hadoop.hdfs.server.blockmanagement.BlockInfo
> 3: 227899550 17144816120 [B
> 4: 287324031 13769408616
> [Lorg.apache.hadoop.hdfs.server.blockmanagement.DatanodeStorageInfo;
> 5: 71352116 12353841568 [Ljava.lang.Object;
> 6: 286322650 9170335840
> [Lorg.apache.hadoop.hdfs.server.blockmanagement.BlockInfo;
> 7: 235632329 7658462416
> [Lorg.apache.hadoop.hdfs.server.namenode.INode$Feature;
> 8: 4 7046430816 [Lorg.apache.hadoop.util.LightWeightGSet$LinkedElement;
> 9: 211997769 6783928608
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileWithSnapshotFeature
> 10: 211997769 5087946456
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiffList
> 11: 76586261 3780468856 [I
> 12: 44572380 3209211360
> org.apache.hadoop.hdfs.server.namenode.INodeFileAttributes$SnapshotCopy
> 13: 58634517 2345380680 java.util.ArrayList
> 14: 44572380 2139474240
> org.apache.hadoop.hdfs.server.namenode.snapshot.FileDiff
> 15: 76582416 1837977984 org.apache.hadoop.hdfs.server.namenode.AclFeature
> 16: 12907668 1135874784
> org.apache.hadoop.hdfs.server.namenode.INodeDirectory{noformat}
> [~szetszwo] [~arpaga] [~smeng] [~shashikant] any thoughts?
> I am thinking that inside
> AbstractINodeDiffList#deleteSnapshotDiff() , in addition to cleaning up file
> diffs, it should also remove FileWithSnapshotFeature. I am not familiar with
> the snapshot implementation, so any guidance is greatly appreciated.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]