Would one of the SequenceFile#merge() methods suffice?

http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.Sorter.html#merge(java.util.List,%20org.apache.hadoop.fs.Path)

Doug

Andrzej Bialecki wrote:
Hi,

Any suggestions how to do that? Let's say I have several part-NNNN MapFile-s created by MapFileOutputFormat using a specified Comparator and Partitioner. How can I traverse the data in strictly ascending global key order (i.e. across all parts)?

The best that comes to my mind is the following pseudo-code:

get the readers;
get the first keys from all readers, and put them on a sorted list;
do {
    remove the smallest key, and retrieve value from its reader;
    add next key from the same reader:
        if it's smaller than other keys, continue;
    if the list is empty, read next values from all readers;
} while (more keys from any reader);

Any other suggestions?


Reply via email to