[ 
https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569198#comment-13569198
 ] 

Suresh Srinivas commented on HDFS-4461:
---------------------------------------

I think my earlier comments perhaps are not clear. Let me give it another try :)

+1 for optimizing the data structures in datanode.

bq. Suresh – we routinely see users with millions of replicas per DN now that 
48TB+ configurations have become commodity. Sure, we should also encourage 
users to use things like HAR to coalesce into larger blocks, but easy wins on 
DN memory usage are a no-brainer IMO.
This is again not the point I am making either. I know and understand that 
number of blocks in DN is growing. Data structures in datanode need to be 
optimized. At the same time, as the DNs support more storage, the DN heap also 
needs to be suitably increased.

What my previous comments are related to the assertion that DirectoryScanner is 
causing OOM. OOM is not caused by the scanner. It is caused by incorrectly 
sizing the datanode JVM heap, unless one shows a leak in DirectoryScanner. So 
the comment was to edit the description to reflect it.

We need to also optimize the long lived data structures in datanode. I thought 
one would start with that instead of DirectoryScanner, which creates short 
lived objects. Create HDFS-4465 to track that.
                
> DirectoryScanner: volume path prefix takes up memory for every block that is 
> scanned 
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-4461
>                 URL: https://issues.apache.org/jira/browse/HDFS-4461
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.3-alpha
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, 
> memory-analysis.png
>
>
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.  
> This object contains two File objects-- one for the metadata file, and one 
> for the block file.  Since those File objects contain full paths, users who 
> pick a lengthly path for their volume roots will end up using an extra 
> N_blocks * path_prefix bytes per block scanned.  We also don't really need to 
> store File objects-- storing strings and then creating File objects as needed 
> would be cheaper.  This would be a nice efficiency improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to