[ 
http://issues.apache.org/jira/browse/HADOOP-830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12461800
 ] 

Owen O'Malley commented on HADOOP-830:
--------------------------------------

This is looking good. Comments:
  1. I think that "ramfs:" would be less confusing that "mem:" as a prefix for 
the ram file system.
  2. The float constants should be represented as "0.25f" rather than "(float) 
0.25".
  3. This patch introduces some methods with more FileSystems in the parameter 
list, when the correct path is to use the getFileSystem on the Paths. So 
cloneFileAttributes should look like:
  public Writer cloneFileAttributes(Path inputFile, Path outputFile, 
Progressable prog) throws IOException
  4. The old cloneFileAttributes should be marked as depricated.
  5. The javadoc on the "factor" parameter should mention that it is the 
maximum merge fan in.
  6. The patch looks for the file's existence in the local and ram file systems 
to see where it was placed. Since in the past, we have had issues with files 
getting deleted via race conditions, it seems better to remember where each 
file was placed. I suggest that MapOutputLocation.getFile return a Path of 
where the file was stored. CopyResult should keep that instead of the size, 
which was only used as an error flag when it is -1. Then the reduce task runner 
can keep a list of files that are in each file system.
  7. getFile also has a catch block where Throwable is caught and ignored. That 
will cause lots of errors to go unreported. I'd rather have the ram FileSystem 
catch OutOfMemoryException explicitly when it is creating the new file and 
return null, like it does if the ram FileSystem is full.

> Improve the performance of the Merge phase
> ------------------------------------------
>
>                 Key: HADOOP-830
>                 URL: http://issues.apache.org/jira/browse/HADOOP-830
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Devaraj Das
>         Assigned To: Devaraj Das
>         Attachments: 830-for-review.new.patch, 830-for-review.patch
>
>
> This issue is about trying to improve the performance of the merge phase 
> (especially on the reduces). Currently, all the map outputs are copied to 
> disk and then the merge phase starts (just to make a note - sorting happens 
> on the maps).
> The first optimization that I plan to implement is to do in-memory merging of 
> the map outputs. There are two buffers maintained - 
> 1) a scratch buffer for writing map outputs (directly off the socket). This 
> is a first-come-first-serve buffer (as opposed to strategies like best fit). 
> The map output copier copies the map output off the socket and puts it here 
> (assuming there is sufficient space in the buffer).
> 2) a merge buffer - when the scratch buffer cannot accomodate any more map 
> output, the roles of the buffers are switched - that is, the scratch buffer 
> becomes the merge buffer and the merge buffer becomes the scratch buffer. We 
> avoid copying by doing this switch of roles. The copier threads can continue 
> writing data from the socket buffer to the current scratch buffer (access to 
> the scratch buffer is synchronized). 
> Both the above buffers are of equal sizes configured to have default values 
> of 100M.
> Before switching roles, a check is done to see whether the merge buffer is in 
> use (merge algorithm is working on the data there). We wait till the merge 
> buffer is free. The hope is that while merging we are reading key/value data 
> from an in-memory buffer and it will be really fast and so we won't see 
> client timeouts on the server serving the map output. However, if they really 
> timeout, the client sees an exception, and resubmits the request to the 
> server.
> With the above we are doing copying/merging in parallel.
> The merge happens and then a spill to disk happens. At the end of the 
> in-memory merge of all the map outputs, we will end up with ~100M files on 
> disk that we will need to merge. Also, the in-memory merge gets triggered 
> when the in-memory scratch buffer has been idle too long (like 30 secs), or, 
> the number of outputs copied so far is equal to the number of maps in the 
> job, whichever is earlier. We can proceed with the regular merge for these 
> on-disk-files and maybe we can do some optimizations there too (haven't put 
> much thought there).
> If the map output can never be copied to the buffer (because the map output 
> is let's say 200M), then that is directly spilled to disk.
> To implement the above, I am planning to extend the FileSystem class to 
> provide an InMemoryFileSystem class that will ease the integration of the 
> in-memory scratch/merge with the existing APIs (like SequenceFile, 
> MapOutputCopier) since all them work with the abstractions of FileSystem and 
> Input/Output streams.
> Comments?

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to