Rohith created MAPREDUCE-5734:
---------------------------------
Summary: Reducer preemption does not happen if node is
blacklisted, intern job get hanged.
Key: MAPREDUCE-5734
URL: https://issues.apache.org/jira/browse/MAPREDUCE-5734
Project: Hadoop Map/Reduce
Issue Type: Bug
Affects Versions: 2.2.0
Environment: SuSE 11 SP2 + Hadoop-2.3
Reporter: Rohith
There are 4 NodeManagers with 8GB each.Total cluster capacity is 32GB.Cluster
slow start is set to 1.
Job is running reducer task occupied 29GB of cluster.One NodeManager(NM-4) is
become unstable(3 Map got killed), MRAppMaster blacklisted unstable
NodeManager(NM-4). All reducer task are running in cluster now.
MRAppMaster does not preempt the reducers because for Reducer preemption
calculation, headRoom is considering blacklisted nodes memory. This makes jobs
to hang forever(ResourceManager does not assing any new containers on
blacklisted nodes but returns availableResouce considers cluster free memory).
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)