[
https://issues.apache.org/jira/browse/HDFS-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612564#comment-13612564
]
Steve Loughran commented on HDFS-4630:
--------------------------------------
I'd say "WONTFIX" over invalid; the OOM is a result of storing all state in
memory for bounded time operations against files, including block retrieval.
That's a design decision. Now, if you want to put EhCache in behind the scenes,
assess its performance with many small files, and its behaviour on big
production clusters, that's a project I'm sure we'd all be curious about -feel
free to have a go!
> Datanode is going OOM due to small files in hdfs
> ------------------------------------------------
>
> Key: HDFS-4630
> URL: https://issues.apache.org/jira/browse/HDFS-4630
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode, namenode
> Affects Versions: 2.0.0-alpha
> Environment: Ubuntu, Java 1.6
> Reporter: Ankush Bhatiya
> Priority: Blocker
>
> Hi,
> We have very small files(size ranging 10KB-1MB) in our hdfs and no of files
> are in tens of millions. Due to this namenode and datanode both going out of
> memory very frequently. When we analyse the head dump of datanode most of the
> memory was used by ReplicaMap.
> Can we use EhCache or other to not to store all the data in memory?
> Thanks
> Ankush
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira