[
https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569013#comment-13569013
]
Andy Isaacson commented on HDFS-4461:
-------------------------------------
bq. A server generally has a lot of String objects. There are also file objects
in ReplicasMap, string paths tracked in many other places as well.
The cluster in question has about 1.5 million blocks per DN, across 12
datadirs. This hprof shows 1,858,340 BlockScanInfo objects. MAT computed the
"Retained Heap" of FsDatasetImpl at 980 MB and the "Retained Heap" of the
DirectoryScanner thread at 1.4 GB.
bq. ScanInfo is a short lived object, unlike other data structures that are
long lived.
It doesn't matter how narrow the peak is, if it exceeds the maximum permissible
value. In this case we seem to have a complete set of ScanInfo objects (for
the entire dataset) active on the heap, with the DirectoryScanner thread in the
process of reconcile()ing them when it OOMs.
> DirectoryScanner: volume path prefix takes up memory for every block that is
> scanned
> -------------------------------------------------------------------------------------
>
> Key: HDFS-4461
> URL: https://issues.apache.org/jira/browse/HDFS-4461
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 2.0.3-alpha
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Priority: Minor
> Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch,
> memory-analysis.png
>
>
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.
> This object contains two File objects-- one for the metadata file, and one
> for the block file. Since those File objects contain full paths, users who
> pick a lengthly path for their volume roots will end up using an extra
> N_blocks * path_prefix bytes per block scanned. We also don't really need to
> store File objects-- storing strings and then creating File objects as needed
> would be cheaper. This would be a nice efficiency improvement.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira