[ 
https://issues.apache.org/jira/browse/MAPREDUCE-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jian He updated MAPREDUCE-5488:
-------------------------------

    Attachment: MAPREDUCE-5488.2.patch
    
> Job recovery fails after killing all the running containers for the app
> -----------------------------------------------------------------------
>
>                 Key: MAPREDUCE-5488
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5488
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.1.0-beta
>            Reporter: Arpit Gupta
>            Assignee: Jian He
>         Attachments: MAPREDUCE-5488.1.patch, MAPREDUCE-5488.2.patch, 
> MAPREDUCE-5488.patch, MAPREDUCE-5488.patch, MAPREDUCE-5488.patch
>
>
> Here is the client stack trace
> {code}
> RUNNING: /usr/lib/hadoop/bin/hadoop jar 
> /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples-2.1.0.2.0.5.0-66.jar 
> wordcount "-Dmapreduce.reduce.input.limit=-1" 
> /user/user/test_yarn_ha/medium_wordcount_input 
> /user/hrt_qa/test_yarn_ha/test_mapred_ha_single_job_applicationmaster-1-time
> 13/08/30 08:45:39 INFO client.RMProxy: Connecting to ResourceManager at 
> hostname/68.142.247.148:8032
> 13/08/30 08:45:40 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 19 
> for user on ha-hdfs:ha-2-secure
> 13/08/30 08:45:40 INFO security.TokenCache: Got dt for hdfs://ha-2-secure; 
> Kind: HDFS_DELEGATION_TOKEN, Service: ha-hdfs:ha-2-secure, Ident: 
> (HDFS_DELEGATION_TOKEN token 19 for user)
> 13/08/30 08:45:40 INFO input.FileInputFormat: Total input paths to process : 
> 20
> 13/08/30 08:45:40 INFO lzo.GPLNativeCodeLoader: Loaded native gpl library
> 13/08/30 08:45:40 INFO lzo.LzoCodec: Successfully loaded & initialized 
> native-lzo library [hadoop-lzo rev cf4e7cbf8ed0f0622504d008101c2729dc0c9ff3]
> 13/08/30 08:45:40 INFO mapreduce.JobSubmitter: number of splits:180
> 13/08/30 08:45:40 WARN conf.Configuration: user.name is deprecated. Instead, 
> use mapreduce.job.user.name
> 13/08/30 08:45:40 WARN conf.Configuration: mapred.jar is deprecated. Instead, 
> use mapreduce.job.jar
> 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.value.class is 
> deprecated. Instead, use mapreduce.job.output.value.class
> 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.combine.class is 
> deprecated. Instead, use mapreduce.job.combine.class
> 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.map.class is deprecated. 
> Instead, use mapreduce.job.map.class
> 13/08/30 08:45:40 WARN conf.Configuration: mapred.job.name is deprecated. 
> Instead, use mapreduce.job.name
> 13/08/30 08:45:40 WARN conf.Configuration: mapreduce.reduce.class is 
> deprecated. Instead, use mapreduce.job.reduce.class
> 13/08/30 08:45:40 WARN conf.Configuration: mapred.input.dir is deprecated. 
> Instead, use mapreduce.input.fileinputformat.inputdir
> 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.dir is deprecated. 
> Instead, use mapreduce.output.fileoutputformat.outputdir
> 13/08/30 08:45:40 WARN conf.Configuration: mapred.map.tasks is deprecated. 
> Instead, use mapreduce.job.maps
> 13/08/30 08:45:40 WARN conf.Configuration: mapred.output.key.class is 
> deprecated. Instead, use mapreduce.job.output.key.class
> 13/08/30 08:45:40 WARN conf.Configuration: mapred.working.dir is deprecated. 
> Instead, use mapreduce.job.working.dir
> 13/08/30 08:45:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: 
> job_1377851032086_0003
> 13/08/30 08:45:41 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, 
> Service: ha-hdfs:ha-2-secure, Ident: (HDFS_DELEGATION_TOKEN token 19 for user)
> 13/08/30 08:45:42 INFO impl.YarnClientImpl: Submitted application 
> application_1377851032086_0003 to ResourceManager at 
> hostname/68.142.247.148:8032
> 13/08/30 08:45:42 INFO mapreduce.Job: The url to track the job: 
> http://hostname:8088/proxy/application_1377851032086_0003/
> 13/08/30 08:45:42 INFO mapreduce.Job: Running job: job_1377851032086_0003
> 13/08/30 08:45:48 INFO mapreduce.Job: Job job_1377851032086_0003 running in 
> uber mode : false
> 13/08/30 08:45:48 INFO mapreduce.Job:  map 0% reduce 0%
> stop applicationmaster
> beaver.component.hadoop|INFO|Kill container 
> container_1377851032086_0003_01_000001 on host hostname
> RUNNING: ssh -o StrictHostKeyChecking=no hostname "sudo su - -c \"ps aux | 
> grep container_1377851032086_0003_01_000001 | awk '{print \\\$2}' | xargs 
> kill -9\" root"
> Warning: Permanently added 'hostname,68.142.247.155' (RSA) to the list of 
> known hosts.
> kill 8978: No such process
> waiting for down time 10 seconds for service applicationmaster
> 13/08/30 08:45:55 INFO ipc.Client: Retrying connect to server: 
> hostname/68.142.247.155:52713. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)
> 13/08/30 08:45:56 INFO ipc.Client: Retrying connect to server: 
> hostname/68.142.247.155:52713. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1 SECONDS)
> 13/08/30 08:45:56 ERROR security.UserGroupInformation: 
> PriviledgedActionException as:user@REALM (auth:KERBEROS) 
> cause:java.io.IOException: java.net.ConnectException: Call From 
> hostname.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> java.io.IOException: java.net.ConnectException: Call From 
> hostname.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> at 
> org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:319)
> at 
> org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:354)
> at 
> org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:529)
> at org.apache.hadoop.mapreduce.Job$5.run(Job.java:668)
> at org.apache.hadoop.mapreduce.Job$5.run(Job.java:665)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1477)
> at org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:665)
> at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1349)
> at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1289)
> at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
> at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
> at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
> Caused by: java.net.ConnectException: Call From hostname.ConnectException: 
> Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
> at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
> at org.apache.hadoop.ipc.Client.call(Client.java:1351)
> at org.apache.hadoop.ipc.Client.call(Client.java:1300)
> at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
> at $Proxy14.getTaskAttemptCompletionEvents(Unknown Source)
> at 
> org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBClientImpl.java:177)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> at java.lang.reflect.Method.invoke(Method.java:597)
> at 
> org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:310)
> ... 23 more
> Caused by: java.net.ConnectException: Connection refused
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
> at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:547)
> at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:642)
> at org.apache.hadoop.ipc.Client$Connection.access$2600(Client.java:314)
> at org.apache.hadoop.ipc.Client.getConnection(Client.java:1399)
> at org.apache.hadoop.ipc.Client.call(Client.java:1318)
> ... 32 more
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to