I would like to merge some SequenceFiles as well, so any help would be great!
Although the solution with the single reducer works great, the files are small so I don't need distribution. I think I will create a simple java program that will read these files and merge them. > From: christoph.schm...@1und1.de > To: mapreduce-user@hadoop.apache.org > Date: Thu, 12 May 2011 15:44:57 +0200 > Subject: AW: How to merge several SequenceFile into one? > > Oops, sorry, I answered in the wrong thread. I intended to reply to the "How > to create a SequenceFile faster" issue. > > Regards, > Christoph > > -----Ursprüngliche Nachricht----- > Von: 丛林 [mailto:congli...@gmail.com] > Gesendet: Donnerstag, 12. Mai 2011 14:30 > An: mapreduce-user@hadoop.apache.org > Betreff: Re: How to merge several SequenceFile into one? > > Hi Christoph, > > If there is no reducer, how can these sequence files be merged? > > Thanks for you advice. > > Best Wishes, > > -Lin > > 在 2011年5月12日 下午7:44,Christoph Schmitz <christoph.schm...@1und1.de> 写道: > > Hi Lin, > > > > you could run a map-only job, i.e. read your data and output it from the > > mapper without any reducer at all (set mapred.reduce.tasks=0 or, > > equivalently, use job.setNumReduceTasks(0)). > > > > That way, you parallelize over your inputs through a number of mappers and > > do not have any sort/shuffle/reduce overhead. > > > > Regards, > > Christoph > > > > -----Ursprüngliche Nachricht----- > > Von: 丛林 [mailto:congli...@gmail.com] > > Gesendet: Donnerstag, 12. Mai 2011 13:16 > > An: mapreduce-user@hadoop.apache.org > > Betreff: Re: How to merge several SequenceFile into one? > > > > Dear Jason, > > > > If the order of the keys in sequence file is not important to me, in > > other words, the sort process is not necessary, how can I stop the > > distributed sort to save the consumption of resource? > > > > Thanks for your suggestion. > > > > Best Wishes, > > > > -Lin > > > > 2011/5/12 jason <urg...@gmail.com>: > >> M/R job with a single reducer would do the job. This way you can > >> utilize distributed sort and merge/combine/dedupe key/values as you > >> wish. > >> > >> On 5/11/11, 丛林 <congli...@gmail.com> wrote: > >>> Hi all, > >>> > >>> There is lots of SequenceFile in HDFS, how can I merge them into one > >>> SequenceFile? > >>> > >>> Thanks for you suggestion. > >>> > >>> -Lin > >>> > >> > >