Hi Lin, you could run a map-only job, i.e. read your data and output it from the mapper without any reducer at all (set mapred.reduce.tasks=0 or, equivalently, use job.setNumReduceTasks(0)).
That way, you parallelize over your inputs through a number of mappers and do not have any sort/shuffle/reduce overhead. Regards, Christoph -----Ursprüngliche Nachricht----- Von: 丛林 [mailto:congli...@gmail.com] Gesendet: Donnerstag, 12. Mai 2011 13:16 An: mapreduce-user@hadoop.apache.org Betreff: Re: How to merge several SequenceFile into one? Dear Jason, If the order of the keys in sequence file is not important to me, in other words, the sort process is not necessary, how can I stop the distributed sort to save the consumption of resource? Thanks for your suggestion. Best Wishes, -Lin 2011/5/12 jason <urg...@gmail.com>: > M/R job with a single reducer would do the job. This way you can > utilize distributed sort and merge/combine/dedupe key/values as you > wish. > > On 5/11/11, 丛林 <congli...@gmail.com> wrote: >> Hi all, >> >> There is lots of SequenceFile in HDFS, how can I merge them into one >> SequenceFile? >> >> Thanks for you suggestion. >> >> -Lin >> >