You could do it with streaming and a single reducer: bin/hadoop jar $HADOOP_HOME/hadoop-0.20.2-streaming.jar -Dmapred.num.reduce.tasks=1 -reducer cat -input /hdfs/directory/allsource* -output mergefile -verbose
-Joey On Fri, Jul 22, 2011 at 1:26 PM, Time Less <timelessn...@gmail.com> wrote: > Hello, List! > > I have several files in HDFS in a single directory that I create throughout > the day. At the end of the day, I want to merge them together into one file. > How do you guys do this? > > It seems this would do it: > hadoop fs -getmerge /hdfs/directory/allsource* > mergefile ; cat mergefile > | hadoop fs -put - ; rm mergefile ; hadoop fs -rm /hdfs/directory/allsource* > > But I wonder if there's a command that can avoid writing to the local > filesystem then re-writing back into HDFS. I'm looking for an HDFS > equivalent to this Unix script: > cat /some/dir/allsource* > /some/dir/merged ; rm /some/dir/allsource* > > -- > Tim Ellis > Data Architect, Riot Games > > -- Joseph Echeverria Cloudera, Inc. 443.305.9434