[
https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568959#comment-13568959
]
Colin Patrick McCabe commented on HDFS-4461:
--------------------------------------------
If someone is running with around 200,000 blocks (a reasonable number), and a
50 to 80 character path, this change saves between 50 and 100 MB of heap space
during the DirectoryScanner run. That's what we should be focusing on here--
the efficiency improvement. After all, that is why I marked this JIRA as
"improvement" rather than "bug" :)
bq. Or at least the number of ScanInfo objects you saw.
I saw more than 1 million {{ScanInfo}} objects. This means that either the
number of blocks on the DN is much higher than we recommend, or there is
another leak in the {{DirectoryScanner}}. I am trying to get confirmation that
the number of blocks is really that high. If it isn't, then we will start
looking more closely for memory leaks in the scanner.
We've found that the block scanner often delivers the finishing blow to DNs
that are already overloaded. This makes sense-- if your heap is already near
max size, asking you to allocate a few hundred megabytes might finish you off.
> DirectoryScanner: volume path prefix takes up memory for every block that is
> scanned
> -------------------------------------------------------------------------------------
>
> Key: HDFS-4461
> URL: https://issues.apache.org/jira/browse/HDFS-4461
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 2.0.3-alpha
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Priority: Minor
> Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch,
> memory-analysis.png
>
>
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.
> This object contains two File objects-- one for the metadata file, and one
> for the block file. Since those File objects contain full paths, users who
> pick a lengthly path for their volume roots will end up using an extra
> N_blocks * path_prefix bytes per block scanned. We also don't really need to
> store File objects-- storing strings and then creating File objects as needed
> would be cheaper. This has been causing out-of-memory conditions for users
> who pick such long volume paths.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira