[jira] Resolved: (HADOOP-141) Disk thrashing / task timeouts during map output copy phase

Owen O'Malley (JIRA) Fri, 13 Oct 2006 20:31:23 -0700

     [ http://issues.apache.org/jira/browse/HADOOP-141?page=all ]


Owen O'Malley resolved HADOOP-141.
----------------------------------

    Fix Version/s: 0.3.0
       Resolution: Fixed
         Assignee: Owen O'Malley

> Disk thrashing / task timeouts during map output copy phase
> -----------------------------------------------------------
>
>                 Key: HADOOP-141
>                 URL: http://issues.apache.org/jira/browse/HADOOP-141
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>         Environment: linux
>            Reporter: p sutter
>         Assigned To: Owen O'Malley
>             Fix For: 0.3.0
>
>
> MapOutputProtocol connections cause timeouts because of system thrashing and 
> transferring the same file over and over again, ultimately leading to making 
> no forward progress(medium sized job, 500GB input file, map output about as 
> large as the input, 10 node cluster).
> There are several bugs behind this, but the following two changes improved 
> matters considerably.
> (1) 
> The buffersize in MapOutputFile is currently hardcoded to 8192 bytes (for 
> both reads and writes). By changing this buffer size to 256KB, the number of 
> disk seeks are reduced and the problem went away. 
> Ideally there would be a buffer size parameter for this that is separate from 
> the DFS io buffer size.
> (2)
> I also added the following code to the socket configuration in both 
> Server.java and Client.java. No linger is a minor good idea in an enivronment 
> with some packet loss (and you will have that when all the nodes get busy at 
> once), but 256KB buffers is probably excessive, especially on a LAN, but it 
> takes me two hours to test changes so I havent experimented.
> socket.setSendBufferSize(256*1024);
> socket.setReceiveBufferSize(256*1024);
> socket.setSoLinger(false, 0);
> socket.setKeepAlive(true);

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Resolved: (HADOOP-141) Disk thrashing / task timeouts during map output copy phase

Reply via email to