[ 
http://issues.apache.org/jira/browse/HADOOP-331?page=comments#action_12442988 ] 
            
Owen O'Malley commented on HADOOP-331:
--------------------------------------

I think that the buffer of the map output should be stored as:

class KeyByteOffset {
  WritableComparable key;
  int offset;      // offset of value in a buffer of bytes
  int length;     // length of value in buffer of bytes
}

List<KeyByteOffset>[NumReduces]

So you have a List of records for each reduce.

When you fill up on #records or #value bytes, sort and spill. (Combine first, 
if there is a combiner.)

The spilled keys should be PartKey's from above so that the standard merge 
tools on sequence files work. (If the PartKey uses the vint representation, 
each key will only expand by 1-2 bytes on disk.) 

I think we should have a new variant of the sequence file merge that returns an 
iterator over key/value pairs, which will be useful in reduce also. 

With that, we can easily merge and produce the final partition index & sequence 
file to local disk.

> map outputs should be written to a single output file with an index
> -------------------------------------------------------------------
>
>                 Key: HADOOP-331
>                 URL: http://issues.apache.org/jira/browse/HADOOP-331
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.3.2
>            Reporter: eric baldeschwieler
>         Assigned To: Devaraj Das
>
> The current strategy of writing a file per target map is consuming a lot of 
> unused buffer space (causing out of memory crashes) and puts a lot of burden 
> on the FS (many opens, inodes used, etc).  
> I propose that we write a single file containing all output and also write an 
> index file IDing which byte range in the file goes to each reduce.  This will 
> remove the issue of buffer waste, address scaling issues with number of open 
> files and generally set us up better for scaling.  It will also have 
> advantages with very small inputs, since the buffer cache will reduce the 
> number of seeks needed and the data serving node can open a single file and 
> just keep it open rather than needing to do directory and open ops on every 
> request.
> The only issue I see is that in cases where the task output is substantiallyu 
> larger than its input, we may need to spill multiple times.  In this case, we 
> can do a merge after all spills are complete (or during the final spill).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to