[ 
http://issues.apache.org/jira/browse/HADOOP-611?page=comments#action_12445297 ] 
            
Devaraj Das commented on HADOOP-611:
------------------------------------

In the current merge code, 'merge-factor' number of keys & values are kept in 
memory. While implementing this, one thought was that we can prevent all the 
'merge-factor' values from being in memory at the same time and fetch them when 
needed. When the user of the merge code does a next() on the MergeQueue to 
fetch the key/value, the system loads in memory the value corresponding to the 
'minimum' key and defers the loading of the value until then.
Implemented this for Compression = NONE & RECORD. However, for BLOCK 
compression, the code for not proactively loading values is already there and 
controlled by a boolean "lazyDecompression" and nothing extra needs to be done. 
The thing is lazyDecompression is controlled via hadoop config (defaulting to 
true). I was thinking whether it makes good sense to remove this configurable 
item and have it as true always.
Any objection to this?

> SequenceFile.Sorter should have a merge method that returns an iterator
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-611
>                 URL: http://issues.apache.org/jira/browse/HADOOP-611
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>         Assigned To: Devaraj Das
>             Fix For: 0.8.0
>
>
> SequenceFile.Sorter should get a new merge method that returns an iterator 
> over the keys/values.
> The current merge method should become a simple method that gets the iterator 
> and writes the records out to a file.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to