JobClient gets rpc timeout exception after running lots of jobs
---------------------------------------------------------------

                 Key: HADOOP-843
                 URL: http://issues.apache.org/jira/browse/HADOOP-843
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.9.2
            Reporter: Arun C Murthy


While testing HADOOP-815 with smallJobsBenchmark (running ~750 jobs with 300 
maps & 2 reduces each) I ran into this:

06/12/20 22:35:22 INFO conf.Configuration: parsing 
file:/export/crawlspace/kryptonite/arunc/hadoop/hadoop-0.9.3-dev/conf/hadoop-default.xml
06/12/20 22:35:22 INFO conf.Configuration: parsing 
file:/export/crawlspace/kryptonite/arunc/hadoop/hadoop-0.9.3-dev/conf/mapred-default.xml
06/12/20 22:35:22 INFO mapred.MultiJobRunner: Running job, Input : 
/mapred/benchmark/input Output : 
/mapred/benchmark/temp/multiMapRedOutput_-2104398834
06/12/20 22:35:39 INFO mapred.JobClient: Running job: job_0746
06/12/20 22:35:41 INFO mapred.JobClient:  map 0% reduce 0%
06/12/20 22:36:07 INFO mapred.JobClient:  map 1% reduce 0%
06/12/20 22:36:08 INFO mapred.JobClient:  map 2% reduce 0%
06/12/20 22:36:09 INFO mapred.JobClient:  map 3% reduce 0%
06/12/20 22:36:10 INFO mapred.JobClient:  map 4% reduce 0%
06/12/20 22:37:11 INFO mapred.JobClient: Communication problem with server: 
java.net.SocketTimeoutException: timed out waiting for rpc response
        at org.apache.hadoop.ipc.Client.call(Client.java:467)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164)
        at $Proxy1.getJobStatus(Unknown Source)
        at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:345)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:382)
        at 
org.apache.hadoop.benchmarks.mapred.MultiJobRunner.runJobInSequence(MultiJobRunner.java:169)
        at 
org.apache.hadoop.benchmarks.mapred.MultiJobRunner.run(MultiJobRunner.java:277)
        at 
org.apache.hadoop.benchmarks.mapred.MultiJobRunner.main(MultiJobRunner.java:401)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:143)
        at 
org.apache.hadoop.benchmarks.mapred.BenchmarkRunner.main(BenchmarkRunner.java:17)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
06/12/20 22:38:46 INFO mapred.JobClient: Communication problem with server: 
java.net.SocketTimeoutException: timed out waiting for rpc response
        at org.apache.hadoop.ipc.Client.call(Client.java:467)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:164)
        at $Proxy1.getJobStatus(Unknown Source)
        at org.apache.hadoop.mapred.JobClient.getJob(JobClient.java:345)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:382)
        at 
org.apache.hadoop.benchmarks.mapred.MultiJobRunner.runJobInSequence(MultiJobRunner.java:169)
        at 
org.apache.hadoop.benchmarks.mapred.MultiJobRunner.run(MultiJobRunner.java:277)
        at 
org.apache.hadoop.benchmarks.mapred.MultiJobRunner.main(MultiJobRunner.java:401)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:143)
        at 
org.apache.hadoop.benchmarks.mapred.BenchmarkRunner.main(BenchmarkRunner.java:17)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

06/12/20 22:38:47 INFO mapred.JobClient:  map 99% reduce 1%
06/12/20 22:38:52 INFO mapred.JobClient:  map 100% reduce 2%
06/12/20 22:39:03 INFO mapred.JobClient:  map 100% reduce 17%
06/12/20 22:39:06 INFO mapred.JobClient:  map 100% reduce 51%
06/12/20 22:39:12 INFO mapred.JobClient:  map 100% reduce 66%
06/12/20 22:39:15 INFO mapred.JobClient:  map 100% reduce 100%
06/12/20 22:39:16 INFO mapred.JobClient: Job complete: job_0746

The job completes successfully though...

Thoughts?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to