[ http://issues.apache.org/jira/browse/HADOOP-830?page=comments#action_12459187 ] Devaraj Das commented on HADOOP-830: ------------------------------------
In the original proposal, I suggested that we can have one buffer for all the files that the InMemoryFileSystem manages. After a discussion with Owen on this, it seems like the alternative arrangement of having one byte[] per file in the InMemoryFileSystem is a better approach, given that we know the lengths of the files before we allocate byte[] buffers for those. In the original proposal, there was an assumption of two equal sized buffers & merge would happen when we have 50% of the total buffer space filled with map outputs. This can be mapped to the multiple buffers (one byte[] per file) case (wherein we consider the total size of all the small buffers). Yes, Sameer, we can & should spill a map output directly to disk if its size is more than a certain fraction of the total buffer space. > Improve the performance of the Merge phase > ------------------------------------------ > > Key: HADOOP-830 > URL: http://issues.apache.org/jira/browse/HADOOP-830 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Reporter: Devaraj Das > Assigned To: Devaraj Das > > This issue is about trying to improve the performance of the merge phase > (especially on the reduces). Currently, all the map outputs are copied to > disk and then the merge phase starts (just to make a note - sorting happens > on the maps). > The first optimization that I plan to implement is to do in-memory merging of > the map outputs. There are two buffers maintained - > 1) a scratch buffer for writing map outputs (directly off the socket). This > is a first-come-first-serve buffer (as opposed to strategies like best fit). > The map output copier copies the map output off the socket and puts it here > (assuming there is sufficient space in the buffer). > 2) a merge buffer - when the scratch buffer cannot accomodate any more map > output, the roles of the buffers are switched - that is, the scratch buffer > becomes the merge buffer and the merge buffer becomes the scratch buffer. We > avoid copying by doing this switch of roles. The copier threads can continue > writing data from the socket buffer to the current scratch buffer (access to > the scratch buffer is synchronized). > Both the above buffers are of equal sizes configured to have default values > of 100M. > Before switching roles, a check is done to see whether the merge buffer is in > use (merge algorithm is working on the data there). We wait till the merge > buffer is free. The hope is that while merging we are reading key/value data > from an in-memory buffer and it will be really fast and so we won't see > client timeouts on the server serving the map output. However, if they really > timeout, the client sees an exception, and resubmits the request to the > server. > With the above we are doing copying/merging in parallel. > The merge happens and then a spill to disk happens. At the end of the > in-memory merge of all the map outputs, we will end up with ~100M files on > disk that we will need to merge. Also, the in-memory merge gets triggered > when the in-memory scratch buffer has been idle too long (like 30 secs), or, > the number of outputs copied so far is equal to the number of maps in the > job, whichever is earlier. We can proceed with the regular merge for these > on-disk-files and maybe we can do some optimizations there too (haven't put > much thought there). > If the map output can never be copied to the buffer (because the map output > is let's say 200M), then that is directly spilled to disk. > To implement the above, I am planning to extend the FileSystem class to > provide an InMemoryFileSystem class that will ease the integration of the > in-memory scratch/merge with the existing APIs (like SequenceFile, > MapOutputCopier) since all them work with the abstractions of FileSystem and > Input/Output streams. > Comments? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
