RE: AW: How to merge several SequenceFile into one?

Panayotis Antonopoulos Tue, 24 May 2011 18:32:02 -0700

I would like to merge some SequenceFiles as well, so any help would be great!


Although the solution with the single reducer works great, the files are small 
so I don't need distribution.
I think I will create a simple java program that will read these files and 
merge them.

> From: christoph.schm...@1und1.de
> To: mapreduce-user@hadoop.apache.org
> Date: Thu, 12 May 2011 15:44:57 +0200
> Subject: AW: How to merge several SequenceFile into one?
> 
> Oops, sorry, I answered in the wrong thread. I intended to reply to the "How 
> to create a SequenceFile faster" issue.
> 
> Regards,
> Christoph
> 
> -----Ursprüngliche Nachricht-----
> Von: 丛林 [mailto:congli...@gmail.com] 
> Gesendet: Donnerstag, 12. Mai 2011 14:30
> An: mapreduce-user@hadoop.apache.org
> Betreff: Re: How to merge several SequenceFile into one?
> 
> Hi Christoph,
> 
> If there is no reducer, how can these sequence files be merged?
> 
> Thanks for you advice.
> 
> Best Wishes,
> 
> -Lin
> 
> 在 2011年5月12日 下午7:44，Christoph Schmitz <christoph.schm...@1und1.de> 写道：
> > Hi Lin,
> >
> > you could run a map-only job, i.e. read your data and output it from the 
> > mapper without any reducer at all (set mapred.reduce.tasks=0 or, 
> > equivalently, use job.setNumReduceTasks(0)).
> >
> > That way, you parallelize over your inputs through a number of mappers and 
> > do not have any sort/shuffle/reduce overhead.
> >
> > Regards,
> > Christoph
> >
> > -----Ursprüngliche Nachricht-----
> > Von: 丛林 [mailto:congli...@gmail.com]
> > Gesendet: Donnerstag, 12. Mai 2011 13:16
> > An: mapreduce-user@hadoop.apache.org
> > Betreff: Re: How to merge several SequenceFile into one?
> >
> > Dear Jason,
> >
> > If the order of the keys in sequence file is not important to me, in
> > other words, the sort process is not necessary, how can I stop the
> > distributed sort to save the consumption of resource?
> >
> > Thanks for your suggestion.
> >
> > Best Wishes,
> >
> > -Lin
> >
> > 2011/5/12 jason <urg...@gmail.com>:
> >> M/R job with a single reducer would do the job. This way you can
> >> utilize distributed sort and merge/combine/dedupe key/values as you
> >> wish.
> >>
> >> On 5/11/11, 丛林 <congli...@gmail.com> wrote:
> >>> Hi all,
> >>>
> >>> There is lots of SequenceFile in HDFS, how can I merge them into one
> >>> SequenceFile?
> >>>
> >>> Thanks for you suggestion.
> >>>
> >>> -Lin
> >>>
> >>
> >

RE: AW: How to merge several SequenceFile into one?

Reply via email to