[
https://issues.apache.org/jira/browse/HDFS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630191#comment-13630191
]
Plamen Jeliazkov commented on HDFS-4475:
----------------------------------------
I think the agreement made was to not try to handle OOM on DataNodes, but to
let them crash. If you can, you should increase the heap size of your DataNode.
In HADOOP-9211, I got that 512mb was enough to not cause any issues while under
stress, with a vanilla setup.
> OutOfMemory by BPServiceActor.offerService() takes down DataNode
> ----------------------------------------------------------------
>
> Key: HDFS-4475
> URL: https://issues.apache.org/jira/browse/HDFS-4475
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.0.0, 2.0.3-alpha
> Reporter: Plamen Jeliazkov
> Assignee: Plamen Jeliazkov
> Fix For: 3.0.0, 2.0.3-alpha
>
>
> In DataNode, there are catchs around BPServiceActor.offerService() call but
> no catch for OutOfMemory as there is for the DataXeiver as introduced in
> 0.22.0.
> The issue can be replicated like this:
> 1) Create a cluster of X DataNodes and 1 NameNode and low memory settings
> (-Xmx128M or something similar).
> 2) Flood HDFS with small file creations (any should work actually).
> 3) DataNodes will hit OoM, stop blockpool service, and shutdown.
> The resolution is to catch the OoMException and handle it properly when
> calling BPServiceActor.offerService() in DataNode.java; like as done in
> 0.22.0 of Hadoop. DataNodes should not shutdown or crash but remain in a sort
> of frozen state until memory issues are resolved by GC.
> LOG ERROR:
> 2013-02-04 11:46:01,854 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Unexpected exception in block pool Block pool
> BP-1105714849-10.10.10.110-1360005776467 (storage id
> DS-1952316202-10.10.10.112-50010-1360005820993) service to
> vmhost2-vm0/10.10.10.110:8020
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 2013-02-04 11:46:01,854 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Ending block pool service for: Block pool
> BP-1105714849-10.10.10.110-1360005776467 (storage id
> DS-1952316202-10.10.10.112-50010-1360005820993) service to
> vmhost2-vm0/10.10.10.110:8020
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira