[ 
https://issues.apache.org/jira/browse/HDFS-4630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13612564#comment-13612564
 ] 

Steve Loughran commented on HDFS-4630:
--------------------------------------

I'd say "WONTFIX" over invalid; the OOM is a result of storing all state in 
memory for bounded time operations against files, including block retrieval. 
That's a design decision. Now, if you want to put EhCache in behind the scenes, 
assess its performance with many small files, and its behaviour on big 
production clusters, that's a project I'm sure we'd all be curious about -feel 
free to have a go!
                
> Datanode is going OOM due to small files in hdfs
> ------------------------------------------------
>
>                 Key: HDFS-4630
>                 URL: https://issues.apache.org/jira/browse/HDFS-4630
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, namenode
>    Affects Versions: 2.0.0-alpha
>         Environment: Ubuntu, Java 1.6
>            Reporter: Ankush Bhatiya
>            Priority: Blocker
>
> Hi, 
> We have very small files(size ranging 10KB-1MB) in our hdfs and no of files 
> are in tens of millions. Due to this namenode and datanode both going out of 
> memory very frequently. When we analyse the head dump of datanode most of the 
> memory was used by ReplicaMap. 
> Can we use EhCache or other to not to store all the data in memory? 
> Thanks
> Ankush

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to