[ http://issues.apache.org/jira/browse/HADOOP-141?page=all ]
Owen O'Malley resolved HADOOP-141. ---------------------------------- Fix Version/s: 0.3.0 Resolution: Fixed Assignee: Owen O'Malley > Disk thrashing / task timeouts during map output copy phase > ----------------------------------------------------------- > > Key: HADOOP-141 > URL: http://issues.apache.org/jira/browse/HADOOP-141 > Project: Hadoop > Issue Type: Bug > Components: mapred > Environment: linux > Reporter: p sutter > Assigned To: Owen O'Malley > Fix For: 0.3.0 > > > MapOutputProtocol connections cause timeouts because of system thrashing and > transferring the same file over and over again, ultimately leading to making > no forward progress(medium sized job, 500GB input file, map output about as > large as the input, 10 node cluster). > There are several bugs behind this, but the following two changes improved > matters considerably. > (1) > The buffersize in MapOutputFile is currently hardcoded to 8192 bytes (for > both reads and writes). By changing this buffer size to 256KB, the number of > disk seeks are reduced and the problem went away. > Ideally there would be a buffer size parameter for this that is separate from > the DFS io buffer size. > (2) > I also added the following code to the socket configuration in both > Server.java and Client.java. No linger is a minor good idea in an enivronment > with some packet loss (and you will have that when all the nodes get busy at > once), but 256KB buffers is probably excessive, especially on a LAN, but it > takes me two hours to test changes so I havent experimented. > socket.setSendBufferSize(256*1024); > socket.setReceiveBufferSize(256*1024); > socket.setSoLinger(false, 0); > socket.setKeepAlive(true); -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira