[ http://issues.apache.org/jira/browse/HADOOP-843?page=comments#action_12460171 ] Arun C Murthy commented on HADOOP-843: --------------------------------------
Clarification: This is very reproducible and happens to every job after ~750 have run. > JobClient gets rpc timeout exception after running lots of jobs > --------------------------------------------------------------- > > Key: HADOOP-843 > URL: http://issues.apache.org/jira/browse/HADOOP-843 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.9.2 > Reporter: Arun C Murthy > > While testing HADOOP-815 with smallJobsBenchmark (running ~750 jobs with 300 > maps & 2 reduces each) I ran into this: > 06/12/20 22:35:22 INFO conf.Configuration: parsing > file:/export/crawlspace/kryptonite/arunc/hadoop/hadoop-0.9.3-dev/conf/hadoop-default.xml > 06/12/20 22:35:22 INFO conf.Configuration: parsing > file:/export/crawlspace/kryptonite/arunc/hadoop/hadoop-0.9.3-dev/conf/mapred-default.xml > 06/12/20 22:35:22 INFO mapred.MultiJobRunner: Running job, Input : > /mapred/benchmark/input Output : > /mapred/benchmark/temp/multiMapRedOutput_-2104398834 > 06/12/20 22:35:39 INFO mapred.JobClient: Running job: job_0746 > 06/12/20 22:35:41 INFO mapred.JobClient: map 0% reduce 0% > 06/12/20 22:36:07 INFO mapred.JobClient: map 1% reduce 0% > 06/12/20 22:36:08 INFO mapred.JobClient: map 2% reduce 0% > 06/12/20 22:36:09 INFO mapred.JobClient: map 3% reduce 0% > 06/12/20 22:36:10 INFO mapred.JobClient: map 4% reduce 0% > 06/12/20 22:37:11 INFO mapred.JobClient: Communication problem with server: > java.net.SocketTimeoutException: timed out waiting for rpc response > at org.apache.hadoop.ipc.Client.call(Client.java:467) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164) > at $Proxy1.getJobStatus(Unknown Source) > at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:345) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:382) > at > org.apache.hadoop.benchmarks.mapred.MultiJobRunner.runJobInSequence(MultiJobRunner.java:169) > at > org.apache.hadoop.benchmarks.mapred.MultiJobRunner.run(MultiJobRunner.java:277) > at > org.apache.hadoop.benchmarks.mapred.MultiJobRunner.main(MultiJobRunner.java:401) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:143) > at > org.apache.hadoop.benchmarks.mapred.BenchmarkRunner.main(BenchmarkRunner.java:17) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.util.RunJar.main(RunJar.java:149) > 06/12/20 22:38:46 INFO mapred.JobClient: Communication problem with server: > java.net.SocketTimeoutException: timed out waiting for rpc response > at org.apache.hadoop.ipc.Client.call(Client.java:467) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164) > at $Proxy1.getJobStatus(Unknown Source) > at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:345) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:382) > at > org.apache.hadoop.benchmarks.mapred.MultiJobRunner.runJobInSequence(MultiJobRunner.java:169) > at > org.apache.hadoop.benchmarks.mapred.MultiJobRunner.run(MultiJobRunner.java:277) > at > org.apache.hadoop.benchmarks.mapred.MultiJobRunner.main(MultiJobRunner.java:401) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:143) > at > org.apache.hadoop.benchmarks.mapred.BenchmarkRunner.main(BenchmarkRunner.java:17) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.hadoop.util.RunJar.main(RunJar.java:149) > 06/12/20 22:38:47 INFO mapred.JobClient: map 99% reduce 1% > 06/12/20 22:38:52 INFO mapred.JobClient: map 100% reduce 2% > 06/12/20 22:39:03 INFO mapred.JobClient: map 100% reduce 17% > 06/12/20 22:39:06 INFO mapred.JobClient: map 100% reduce 51% > 06/12/20 22:39:12 INFO mapred.JobClient: map 100% reduce 66% > 06/12/20 22:39:15 INFO mapred.JobClient: map 100% reduce 100% > 06/12/20 22:39:16 INFO mapred.JobClient: Job complete: job_0746 > The job completes successfully though... > Thoughts? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
