[ 
https://issues.apache.org/jira/browse/MAPREDUCE-1276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12864270#action_12864270
 ] 

Amareshwari Sriramadasu commented on MAPREDUCE-1276:
----------------------------------------------------

I repeated the manual testing described in my earlier comment.
I tried to simulate read timeout for m_00001_0 by explicitly adding a sleep in 
TaskTracker.MapOutputServlet.sendMapFile(). The attempt fails with error "Too 
many fetch failures" as expected. 
But most of the times I see m_00002_0 also failing with following error:
{noformat}
Map output lost, rescheduling: error on sending map 
attempt_201005051443_0003_m_000002_0 to reduce 1
org.mortbay.jetty.EofException 
at org.mortbay.jetty.HttpGenerator.flush(HttpGenerator.java:787) 
at org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:566) 
at org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:946) 
at java.io.DataOutputStream.flush(DataOutputStream.java:106) 
at 
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.sendMapFile(TaskTracker.java:3646)
 
at 
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3517)
 
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) 
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) 
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502) 
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1124)
 
at 
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:766)
 at
..
{noformat}
Jothi/Chris, do you think this is an agreeable failure? 
I think we should catch this as not an inputException and do a retry.


> Shuffle connection logic needs correction 
> ------------------------------------------
>
>                 Key: MAPREDUCE-1276
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1276
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: task
>    Affects Versions: 0.21.0
>            Reporter: Jothi Padmanabhan
>            Assignee: Amareshwari Sriramadasu
>            Priority: Blocker
>             Fix For: 0.21.0
>
>         Attachments: patch-1276-1.txt, patch-1276.txt
>
>
> While looking at the code with Amareshwari, we realized that  
> {{Fetcher#copyFromHost}} marks connection as successful when 
> {{url.openConnection}} returns. This is wrong. Connection is done inside 
> implicitly inside {{getInputStream}}; we need to split {{getInputStream}} 
> into {{connect}} and {{getInputStream}} to handle the connection and read 
> time out strategies correctly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to