[ 
http://issues.apache.org/jira/browse/HADOOP-611?page=comments#action_12448390 ] 
            
Devaraj Das commented on HADOOP-611:
------------------------------------

> I don't understand the ignoreSync, doSync code that you have in the the 
> SegmentDescriptor. You should never set the sync = null on a Reader. It is 
> done 
> on merge outputs via writer.sync = null to keep the writer from putting in 
> sync 
> blocks, which wastes space since the merge outputs won't be split as map 
> inputs.

This is done to make sure that we can handle inputs that come with a Sync. For 
example, if you look at the code for readBlock in SequenceFile.java, there is a 
dependency on reader.sync being null or not. The temp output does not have 
syncs and by default I don't expect syncs. So for the sort output, we don't 
have syncs. But for the Public merge APIs that take external pathnames as an 
argument, the assumption is that they are strictly conforming to the sequence 
file format and hence require Sync checks.

Also, have a look at MergePass.run in SequenceFile.java (without this patch 
applied). There is an explicit "reader.sync = null" done there.

> SequenceFile.Sorter should have a merge method that returns an iterator
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-611
>                 URL: http://issues.apache.org/jira/browse/HADOOP-611
>             Project: Hadoop
>          Issue Type: New Feature
>          Components: io
>            Reporter: Owen O'Malley
>         Assigned To: Devaraj Das
>             Fix For: 0.9.0
>
>         Attachments: merge.patch, merge.patch, merge.patch, merge.patch
>
>
> SequenceFile.Sorter should get a new merge method that returns an iterator 
> over the keys/values.
> The current merge method should become a simple method that gets the iterator 
> and writes the records out to a file.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to