[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Naisbitt updated MAPREDUCE-2998:
----------------------------------------

    Priority: Critical  (was: Blocker)

Looking at the NodeManager logs, I am seeing this:
11/09/15 12:46:51 WARN monitor.ContainersMonitorImpl: Container 
[pid=22125,containerID=container_1316090748961_0001_01_000001] is running 
beyond memory-limits. Current usage : 2150408192bytes. Limit : 2147483648bytes. 
Killing container. 
Dump of the process-tree for container_1316090748961_0001_01_000001 : 
   |- 22125 21539 22125 22125 (bash) 1 1 65400832 291 /bin/bash -c 
/home/<USERNAME>/hadoop-build/jdk1.6.0_12/bin/java 
-Dhadoop.root.logger=DEBUG,console -Xmx1536m 
org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1316090748961 1 1 
1>/home/hadoop/mapred/nm/logs/application_1316090748961_0001/container_1316090748961_0001_01_000001/stdout
 2>/home/


So, for some reason the AM is using more memory now.  We should figure out why 
it's using so much memory lately.  In the meantime though, as a temporary 
workaround, you can use this option when running your jobs: 
-Dyarn.app.mapreduce.am.command-opts=-Xmx1024m
...or add this to your yarn-site.xml file:
  <property>
    <name>yarn.app.mapreduce.am.command-opts</name>
    <value>-Xmx1024m</value>
  </property>

I am removing the Blocker priority since we have a workaround now.

> Failing to contact Am/History for jobs: java.io.EOFException in 
> DataInputStream
> -------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2998
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2998
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.0, 0.24.0
>            Reporter: Jeffrey Naisbitt
>            Priority: Critical
>         Attachments: amlog
>
>
> I am getting an exception frequently when running my jobs on a single-node 
> cluster.  It happens with basically any job I run: sometimes the job will 
> work, but most of the time I get this exception (in this case, I was running 
> a simple wordcount from the examples jar - where I got the exception 4 times 
> in a row, and then the job worked the fifth time I submitted it). 
> Sometimes restarting the namenode, resourcemanager, and historyserver helps - 
> but not always.  Several other developers have seen this problem.
> 11/09/12 17:17:50 INFO mapred.YARNRunner: AppMaster capability = memory: 
> 2048, 
> 11/09/12 17:17:51 INFO mapred.YARNRunner: Command to launch container for 
> ApplicationMaster is : $JAVA_HOME/bin/java -Dhadoop.root.logger=DEBUG,console 
> -Xmx1536m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1315847180566 6 
> <FAILCOUNT> 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr 
> 11/09/12 17:17:51 INFO mapred.ResourceMgrDelegate: Submitted application 
> application_1315847180566_6 to ResourceManager
> 11/09/12 17:17:51 INFO mapred.ClientCache: Connecting to HistoryServer at: 
> 0.0.0.0:10020
> 11/09/12 17:17:51 INFO ipc.YarnRPC: Creating YarnRPC for 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
> 11/09/12 17:17:51 INFO mapred.ClientCache: Connected to HistoryServer at: 
> 0.0.0.0:10020
> 11/09/12 17:17:51 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc proxy 
> for protocol interface org.apache.hadoop.mapreduce.v2.api.MRClientProtocol
> 11/09/12 17:17:51 INFO mapreduce.Job: Running job: job_1315847180566_0006
> 11/09/12 17:17:52 INFO mapreduce.Job:  map 0% reduce 0%
> 11/09/12 17:18:00 INFO mapred.ClientServiceDelegate: Tracking Url of JOB is 
> <IP-ADDRESS>:55361
> 11/09/12 17:18:00 INFO mapred.ClientServiceDelegate: Connecting to 
> <IP-ADDRESS>:43465
> 11/09/12 17:18:00 INFO ipc.YarnRPC: Creating YarnRPC for 
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
> 11/09/12 17:18:00 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc proxy 
> for protocol interface org.apache.hadoop.mapreduce.v2.api.MRClientProtocol
> 11/09/12 17:18:01 INFO mapred.ClientServiceDelegate: Failed to contact 
> AM/History for job job_1315847180566_0006  Will retry..
> java.lang.reflect.UndeclaredThrowableException
>     at 
> org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBClientImpl.java:179)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at 
> org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:237)
>     at 
> org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:276)
>     at 
> org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:547)
>     at org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:540)
>     at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1144)
>     at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1092)
>     at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
>     at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
>     at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     at java.lang.reflect.Method.invoke(Method.java:597)
>     at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Call to 
> /<IP-ADDRESS>:43465 failed on local exception: java.io.EOFException
>     at 
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
>     at $Proxy8.getTaskAttemptCompletionEvents(Unknown Source)
>     at 
> org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBClientImpl.java:172)
>     ... 23 more
> Caused by: java.io.IOException: Call to /<IP-ADDRESS>:43465 failed on local 
> exception: java.io.EOFException
>     at org.apache.hadoop.ipc.Client.wrapException(Client.java:1119)
>     at org.apache.hadoop.ipc.Client.call(Client.java:1087)
>     at 
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
>     ... 25 more
> Caused by: java.io.EOFException
>     at java.io.DataInputStream.readInt(DataInputStream.java:375)
>     at 
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:816)
>     at org.apache.hadoop.ipc.Client$Connection.run(Client.java:754)
> 11/09/12 17:18:01 INFO mapreduce.Job: Job job_1315847180566_0006 failed with 
> state FAILED
> 11/09/12 17:18:01 INFO mapreduce.Job: Counters: 0 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to