[ http://issues.apache.org/jira/browse/HADOOP-141?page=all ]
Owen O'Malley resolved HADOOP-141.
----------------------------------
Fix Version/s: 0.3.0
Resolution: Fixed
Assignee: Owen O'Malley
> Disk thrashing / task timeouts during map output copy phase
> -----------------------------------------------------------
>
> Key: HADOOP-141
> URL: http://issues.apache.org/jira/browse/HADOOP-141
> Project: Hadoop
> Issue Type: Bug
> Components: mapred
> Environment: linux
> Reporter: p sutter
> Assigned To: Owen O'Malley
> Fix For: 0.3.0
>
>
> MapOutputProtocol connections cause timeouts because of system thrashing and
> transferring the same file over and over again, ultimately leading to making
> no forward progress(medium sized job, 500GB input file, map output about as
> large as the input, 10 node cluster).
> There are several bugs behind this, but the following two changes improved
> matters considerably.
> (1)
> The buffersize in MapOutputFile is currently hardcoded to 8192 bytes (for
> both reads and writes). By changing this buffer size to 256KB, the number of
> disk seeks are reduced and the problem went away.
> Ideally there would be a buffer size parameter for this that is separate from
> the DFS io buffer size.
> (2)
> I also added the following code to the socket configuration in both
> Server.java and Client.java. No linger is a minor good idea in an enivronment
> with some packet loss (and you will have that when all the nodes get busy at
> once), but 256KB buffers is probably excessive, especially on a LAN, but it
> takes me two hours to test changes so I havent experimented.
> socket.setSendBufferSize(256*1024);
> socket.setReceiveBufferSize(256*1024);
> socket.setSoLinger(false, 0);
> socket.setKeepAlive(true);
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira