zhaoyunjiong created MAPREDUCE-6024:
---------------------------------------

             Summary: java.net.SocketTimeoutException in Fetcher caused jobs 
stuck for more than 1 hour
                 Key: MAPREDUCE-6024
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6024
             Project: Hadoop Map/Reduce
          Issue Type: Improvement
          Components: mr-am, task
            Reporter: zhaoyunjiong
            Assignee: zhaoyunjiong
            Priority: Critical


2014-08-04 21:09:42,356 WARN fetcher#33 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
fake.host.name:13562 with 2 map outputs
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:697)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:289)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)
2014-08-04 21:09:42,360 INFO fetcher#33 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: 
fake.host.name:13562 freed by fetcher#33 in 180024ms
2014-08-04 21:09:55,360 INFO fetcher#33 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning 
fake.host.name:13562 with 3 to fetcher#33
2014-08-04 21:09:55,360 INFO fetcher#33 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 3 of 3 
to fake.host.name:13562 to fetcher#33
2014-08-04 21:12:55,463 WARN fetcher#33 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
fake.host.name:13562 with 3 map outputs
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:697)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:289)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)
...
2014-08-04 22:03:13,416 INFO fetcher#33 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: 
fake.host.name:13562 freed by fetcher#33 in 271081ms
2014-08-04 22:04:13,417 INFO fetcher#33 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: Assigning 
fake.host.name:13562 with 3 to fetcher#33
2014-08-04 22:04:13,417 INFO fetcher#33 
org.apache.hadoop.mapreduce.task.reduce.ShuffleSchedulerImpl: assigned 3 of 3 
to fake.host.name:13562 to fetcher#33
2014-08-04 22:07:13,449 WARN fetcher#33 
org.apache.hadoop.mapreduce.task.reduce.Fetcher: Failed to connect to 
fake.host.name:13562 with 3 map outputs
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:697)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:640)
at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
at 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:289)
at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to