[ https://issues.apache.org/jira/browse/HADOOP-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen O'Malley updated HADOOP-1400: ---------------------------------- Status: Patch Available (was: Open) > JobClient rpc times out getting job status > ------------------------------------------ > > Key: HADOOP-1400 > URL: https://issues.apache.org/jira/browse/HADOOP-1400 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.12.3 > Reporter: Nigel Daley > Assignee: Owen O'Malley > Fix For: 0.14.0 > > Attachments: job-client-retry-4.patch, job-client-retry.patch > > > JobClient rpc times out getting job status and logs an INFO exception > message. JobClient seems to recover, however, so I think the exception > should be logged at DEBUG or the message indicate that JobClient will retry. > Here's an example log: > 07/05/20 20:45:38 INFO mapred.JobClient: Running job: job_0029 > 07/05/20 20:45:39 INFO mapred.JobClient: map 0% reduce 0% > 07/05/20 20:45:43 INFO mapred.JobClient: map 2% reduce 0% > 07/05/20 20:45:44 INFO mapred.JobClient: map 7% reduce 0% > 07/05/20 20:45:45 INFO mapred.JobClient: map 17% reduce 0% > 07/05/20 20:45:46 INFO mapred.JobClient: map 23% reduce 0% > 07/05/20 20:45:47 INFO mapred.JobClient: map 36% reduce 0% > 07/05/20 20:45:48 INFO mapred.JobClient: map 60% reduce 0% > 07/05/20 20:45:49 INFO mapred.JobClient: map 71% reduce 0% > 07/05/20 20:45:50 INFO mapred.JobClient: map 73% reduce 0% > 07/05/20 20:45:51 INFO mapred.JobClient: map 75% reduce 0% > 07/05/20 20:45:52 INFO mapred.JobClient: map 79% reduce 0% > 07/05/20 20:45:53 INFO mapred.JobClient: map 84% reduce 0% > 07/05/20 20:45:54 INFO mapred.JobClient: map 97% reduce 0% > 07/05/20 20:45:57 INFO mapred.JobClient: map 99% reduce 0% > 07/05/20 20:45:58 INFO mapred.JobClient: map 100% reduce 1% > 07/05/20 20:46:00 INFO mapred.JobClient: map 100% reduce 2% > 07/05/20 20:46:08 INFO mapred.JobClient: map 100% reduce 3% > 07/05/20 20:46:11 INFO mapred.JobClient: map 100% reduce 4% > 07/05/20 20:46:18 INFO mapred.JobClient: map 100% reduce 5% > 07/05/20 20:46:21 INFO mapred.JobClient: map 100% reduce 6% > 07/05/20 20:46:28 INFO mapred.JobClient: map 100% reduce 7% > 07/05/20 20:46:31 INFO mapred.JobClient: map 100% reduce 8% > 07/05/20 20:46:37 INFO mapred.JobClient: map 100% reduce 9% > 07/05/20 20:46:39 INFO mapred.JobClient: map 100% reduce 10% > 07/05/20 20:46:47 INFO mapred.JobClient: map 100% reduce 11% > 07/05/20 20:46:50 INFO mapred.JobClient: map 100% reduce 12% > 07/05/20 20:46:57 INFO mapred.JobClient: map 100% reduce 13% > 07/05/20 20:46:59 INFO mapred.JobClient: map 100% reduce 14% > 07/05/20 20:47:07 INFO mapred.JobClient: map 100% reduce 15% > 07/05/20 20:47:09 INFO mapred.JobClient: map 100% reduce 16% > 07/05/20 20:47:17 INFO mapred.JobClient: map 100% reduce 17% > 07/05/20 20:47:19 INFO mapred.JobClient: map 100% reduce 18% > 07/05/20 20:47:28 INFO mapred.JobClient: map 100% reduce 19% > 07/05/20 20:47:29 INFO mapred.JobClient: map 100% reduce 20% > 07/05/20 20:47:35 INFO mapred.JobClient: map 100% reduce 21% > 07/05/20 20:47:39 INFO mapred.JobClient: map 100% reduce 22% > 07/05/20 20:47:44 INFO mapred.JobClient: map 100% reduce 23% > 07/05/20 20:47:49 INFO mapred.JobClient: map 100% reduce 24% > 07/05/20 20:47:58 INFO mapred.JobClient: map 100% reduce 25% > 07/05/20 20:47:59 INFO mapred.JobClient: map 100% reduce 26% > 07/05/20 20:48:05 INFO mapred.JobClient: map 100% reduce 27% > 07/05/20 20:48:09 INFO mapred.JobClient: map 100% reduce 28% > 07/05/20 20:48:14 INFO mapred.JobClient: map 100% reduce 29% > 07/05/20 20:48:19 INFO mapred.JobClient: map 100% reduce 30% > 07/05/20 20:48:24 INFO mapred.JobClient: map 100% reduce 31% > 07/05/20 20:48:29 INFO mapred.JobClient: map 100% reduce 32% > 07/05/20 20:48:31 INFO mapred.JobClient: map 100% reduce 35% > 07/05/20 20:49:32 INFO mapred.JobClient: Communication problem with server: > java.net.SocketTimeoutException: timed out waiting for rpc response > at org.apache.hadoop.ipc.Client.call(Client.java:473) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165) > at $Proxy1.getJobStatus(Unknown Source) > at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:491) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:556) > at org.apache.hadoop.mapred.MRBench.runJobInSequence(MRBench.java:188) > at org.apache.hadoop.mapred.MRBench.main(MRBench.java:280) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:69) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:140) > at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:64) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:155) > 07/05/20 20:49:35 INFO mapred.JobClient: map 100% reduce 97% > 07/05/20 20:49:37 INFO mapred.JobClient: map 100% reduce 98% > 07/05/20 20:49:40 INFO mapred.JobClient: map 100% reduce 99% > 07/05/20 20:49:42 INFO mapred.JobClient: map 100% reduce 100% > 07/05/20 20:49:43 INFO mapred.JobClient: Job complete: job_0029 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.