[ http://issues.apache.org/jira/browse/HADOOP-611?page=comments#action_12444740 ] Devaraj Das commented on HADOOP-611: ------------------------------------
Thinking of exposing the following public APIs to the merge subsystem: 1. RawKeyValueIterator merge (List <Path> input) 2. RawKeyValueIterator merge (SegmentDescriptor segments) //The SegmentDescriptor is a class one of whose constructors is //SegmentDescriptor (long start, long length, Path segmentPathName). 3. void writeFile(RawKeyValueIterator records, SequenceFile.Writer writer) throws IOException 4. SequenceFile.Writer cloneFileAttributes(FileSystem fileSys, Path inputFile, Path outputFile, Progressable prog ) throws IOException; //This API will look at the input file attributes (compression - block/value & codec for //now) and apply them to the writer (so that things like compression methods are //preserved). The progressable is just passed to the SequenceFile.Writer //constructor. Later on, API (3) can use the writer. Also, the RawKeyValueIterator interface doesn't seem to require the getTotalBytes() method. So will remove that. > SequenceFile.Sorter should have a merge method that returns an iterator > ----------------------------------------------------------------------- > > Key: HADOOP-611 > URL: http://issues.apache.org/jira/browse/HADOOP-611 > Project: Hadoop > Issue Type: New Feature > Components: io > Reporter: Owen O'Malley > Assigned To: Devaraj Das > Fix For: 0.8.0 > > > SequenceFile.Sorter should get a new merge method that returns an iterator > over the keys/values. > The current merge method should become a simple method that gets the iterator > and writes the records out to a file. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira