AW: How to merge several SequenceFile into one?

Christoph Schmitz Thu, 12 May 2011 04:45:32 -0700

Hi Lin,

you could run a map-only job, i.e. read your data and output it from the mapper 
without any reducer at all (set mapred.reduce.tasks=0 or, equivalently, use 
job.setNumReduceTasks(0)).

That way, you parallelize over your inputs through a number of mappers and do 
not have any sort/shuffle/reduce overhead.

Regards,
Christoph

-----Ursprüngliche Nachricht-----
Von: 丛林 [mailto:congli...@gmail.com] 
Gesendet: Donnerstag, 12. Mai 2011 13:16
An: mapreduce-user@hadoop.apache.org
Betreff: Re: How to merge several SequenceFile into one?

Dear Jason,

If the order of the keys in sequence file is not important to me, in
other words, the sort process is not necessary, how can I stop the
distributed sort to save the consumption of resource?

Thanks for your suggestion.

Best Wishes,

-Lin

2011/5/12 jason <urg...@gmail.com>:
> M/R job with a single reducer would do the job. This way you can
> utilize distributed sort and merge/combine/dedupe key/values as you
> wish.
>
> On 5/11/11, 丛林 <congli...@gmail.com> wrote:
>> Hi all,
>>
>> There is lots of SequenceFile in HDFS, how can I merge them into one
>> SequenceFile?
>>
>> Thanks for you suggestion.
>>
>> -Lin
>>
>

AW: How to merge several SequenceFile into one?

Reply via email to