[
https://issues.apache.org/jira/browse/HDFS-4475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Allen Wittenauer updated HDFS-4475:
-----------------------------------
Fix Version/s: (was: 2.0.3-alpha)
(was: 3.0.0)
> OutOfMemory by BPServiceActor.offerService() takes down DataNode
> ----------------------------------------------------------------
>
> Key: HDFS-4475
> URL: https://issues.apache.org/jira/browse/HDFS-4475
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 3.0.0, 2.0.3-alpha
> Reporter: Plamen Jeliazkov
> Assignee: Plamen Jeliazkov
>
> In DataNode, there are catchs around BPServiceActor.offerService() call but
> no catch for OutOfMemory as there is for the DataXeiver as introduced in
> 0.22.0.
> The issue can be replicated like this:
> 1) Create a cluster of X DataNodes and 1 NameNode and low memory settings
> (-Xmx128M or something similar).
> 2) Flood HDFS with small file creations (any should work actually).
> 3) DataNodes will hit OoM, stop blockpool service, and shutdown.
> The resolution is to catch the OoMException and handle it properly when
> calling BPServiceActor.offerService() in DataNode.java; like as done in
> 0.22.0 of Hadoop. DataNodes should not shutdown or crash but remain in a sort
> of frozen state until memory issues are resolved by GC.
> LOG ERROR:
> 2013-02-04 11:46:01,854 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Unexpected exception in block pool Block pool
> BP-1105714849-10.10.10.110-1360005776467 (storage id
> DS-1952316202-10.10.10.112-50010-1360005820993) service to
> vmhost2-vm0/10.10.10.110:8020
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> 2013-02-04 11:46:01,854 WARN org.apache.hadoop.hdfs.server.datanode.DataNode:
> Ending block pool service for: Block pool
> BP-1105714849-10.10.10.110-1360005776467 (storage id
> DS-1952316202-10.10.10.112-50010-1360005820993) service to
> vmhost2-vm0/10.10.10.110:8020
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)