Min Zhou created MAPREDUCE-6129:
-----------------------------------
Summary: Job failed due to counter out of limited in MRAppMaster
Key: MAPREDUCE-6129
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6129
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: applicationmaster
Reporter: Min Zhou
Lots of of cluster's job use more than 120 counters, those kind of jobs failed
with exception like below
{noformat}
2014-10-15 22:55:43,742 WARN [Socket Reader #1 for port 45673]
org.apache.hadoop.ipc.Server: Unable to read call parameters for client
10.180.216.12on connection protocol
org.apache.hadoop.mapred.TaskUmbilicalProtocol for rpcKind RPC_WRITABLE
org.apache.hadoop.mapreduce.counters.LimitExceededException: Too many counters:
121 max=120
at
org.apache.hadoop.mapreduce.counters.Limits.checkCounters(Limits.java:103)
at
org.apache.hadoop.mapreduce.counters.Limits.incrCounters(Limits.java:110)
at
org.apache.hadoop.mapreduce.counters.AbstractCounterGroup.readFields(AbstractCounterGroup.java:175)
at org.apache.hadoop.mapred.Counters$Group.readFields(Counters.java:324)
at
org.apache.hadoop.mapreduce.counters.AbstractCounters.readFields(AbstractCounters.java:314)
at org.apache.hadoop.mapred.TaskStatus.readFields(TaskStatus.java:489)
at
org.apache.hadoop.mapred.ReduceTaskStatus.readFields(ReduceTaskStatus.java:140)
at
org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
at
org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:157)
at
org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1802)
at
org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1734)
at
org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1494)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:732)
at
org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:606)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:577)
{noformat}
The class org.apache.hadoop.mapreduce.counters.Limits load the mapred-site.xml
on nodemanager node for JobConf if it hasn't been inited.
If the mapred-site.xml on nodemanager node is not exist or the
mapreduce.job.counters.max hasn't been defined on that file, Class
org.apache.hadoop.mapreduce.counters.Limits will just use the default value
120.
Instead, we should read user job's conf file rather than config files on
nodemanager for checking counters limits.
I will submitt a patch later.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)