[
https://issues.apache.org/jira/browse/MAPREDUCE-2998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeffrey Naisbitt updated MAPREDUCE-2998:
----------------------------------------
Environment: (was: This is now causing delays and failures in our QE
cycles.)
Thanks for the comment, Vinod. I will post logs. This is currently trivial to
reproduce since it happens 4 out of 5 times for me.
This is now causing delays and failures in our QE cycles.
Also, in our QE clusters, this failure is also corrupting HDFS blocks:
$ $hadoophdfs fsck / | grep CORRUP
11/09/14 23:51:50 WARN conf.Configuration: mapred.used.genericoptionsparser is
deprecated. Instead, use
mapreduce.client.genericoptionsparser.used
Connecting to namenode via https://<HOSTNAME>:50470
/mapred/history/done_intermediate/hadoopqa/job_1316042948239_0009.summary:
CORRUPT blockpool
BP-1806827031-98.137.110.254-1316028239408 block blk_7057908993439990810
/mapred/history/done_intermediate/hadoopqa/job_1316042948239_0010.summary:
CORRUPT blockpool
BP-1806827031-98.137.110.254-1316028239408 block blk_3616335227169344631
/mapred/history/done_intermediate/hadoopqa/job_1316042948239_0012.summary:
CORRUPT blockpool
BP-1806827031-98.137.110.254-1316028239408 block blk_-7171149404998570529
/mapred/history/done_intermediate/hadoopqa/job_1316042948239_0012_conf.xml:
CORRUPT blockpool
BP-1806827031-98.137.110.254-1316028239408 block blk_5875537261129974720
..Status: CORRUPT
CORRUPT FILES: 4
CORRUPT BLOCKS: 4
The filesystem under path '/' is CORRUPT
This is causing cluster to be in Safemode unless we leave manually
> Failing to contact Am/History for jobs: java.io.EOFException in
> DataInputStream
> -------------------------------------------------------------------------------
>
> Key: MAPREDUCE-2998
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-2998
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: mrv2
> Affects Versions: 0.23.0, 0.24.0
> Reporter: Jeffrey Naisbitt
> Priority: Blocker
>
> I am getting an exception frequently when running my jobs on a single-node
> cluster. It happens with basically any job I run: sometimes the job will
> work, but most of the time I get this exception (in this case, I was running
> a simple wordcount from the examples jar - where I got the exception 4 times
> in a row, and then the job worked the fifth time I submitted it).
> Sometimes restarting the namenode, resourcemanager, and historyserver helps -
> but not always. Several other developers have seen this problem.
> 11/09/12 17:17:50 INFO mapred.YARNRunner: AppMaster capability = memory:
> 2048,
> 11/09/12 17:17:51 INFO mapred.YARNRunner: Command to launch container for
> ApplicationMaster is : $JAVA_HOME/bin/java -Dhadoop.root.logger=DEBUG,console
> -Xmx1536m org.apache.hadoop.mapreduce.v2.app.MRAppMaster 1315847180566 6
> <FAILCOUNT> 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
> 11/09/12 17:17:51 INFO mapred.ResourceMgrDelegate: Submitted application
> application_1315847180566_6 to ResourceManager
> 11/09/12 17:17:51 INFO mapred.ClientCache: Connecting to HistoryServer at:
> 0.0.0.0:10020
> 11/09/12 17:17:51 INFO ipc.YarnRPC: Creating YarnRPC for
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
> 11/09/12 17:17:51 INFO mapred.ClientCache: Connected to HistoryServer at:
> 0.0.0.0:10020
> 11/09/12 17:17:51 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc proxy
> for protocol interface org.apache.hadoop.mapreduce.v2.api.MRClientProtocol
> 11/09/12 17:17:51 INFO mapreduce.Job: Running job: job_1315847180566_0006
> 11/09/12 17:17:52 INFO mapreduce.Job: map 0% reduce 0%
> 11/09/12 17:18:00 INFO mapred.ClientServiceDelegate: Tracking Url of JOB is
> <IP-ADDRESS>:55361
> 11/09/12 17:18:00 INFO mapred.ClientServiceDelegate: Connecting to
> <IP-ADDRESS>:43465
> 11/09/12 17:18:00 INFO ipc.YarnRPC: Creating YarnRPC for
> org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC
> 11/09/12 17:18:00 INFO ipc.HadoopYarnRPC: Creating a HadoopYarnProtoRpc proxy
> for protocol interface org.apache.hadoop.mapreduce.v2.api.MRClientProtocol
> 11/09/12 17:18:01 INFO mapred.ClientServiceDelegate: Failed to contact
> AM/History for job job_1315847180566_0006 Will retry..
> java.lang.reflect.UndeclaredThrowableException
> at
> org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBClientImpl.java:179)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:237)
> at
> org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:276)
> at
> org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:547)
> at org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:540)
> at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1144)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1092)
> at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
> at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:189)
> Caused by: com.google.protobuf.ServiceException: java.io.IOException: Call to
> /<IP-ADDRESS>:43465 failed on local exception: java.io.EOFException
> at
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:139)
> at $Proxy8.getTaskAttemptCompletionEvents(Unknown Source)
> at
> org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBClientImpl.java:172)
> ... 23 more
> Caused by: java.io.IOException: Call to /<IP-ADDRESS>:43465 failed on local
> exception: java.io.EOFException
> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1119)
> at org.apache.hadoop.ipc.Client.call(Client.java:1087)
> at
> org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Invoker.invoke(ProtoOverHadoopRpcEngine.java:136)
> ... 25 more
> Caused by: java.io.EOFException
> at java.io.DataInputStream.readInt(DataInputStream.java:375)
> at
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:816)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:754)
> 11/09/12 17:18:01 INFO mapreduce.Job: Job job_1315847180566_0006 failed with
> state FAILED
> 11/09/12 17:18:01 INFO mapreduce.Job: Counters: 0
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira