Hi, > There is lots of SequenceFile in HDFS, how can I merge them into one > SequenceFile?
The simplest way to do that is to create a job that - input format = sequence file - map = identity mapper - reduce = identity reduce - output = sequence file and job.setNumReduceTasks(1) However: I think it is a useless thing to do. Sequence files are only really useful inside a Hadoop cluster serving as input for later jobs. And having multiple files only helps Hadoop in scaling out. So my question to you: Why do you want that? -- Best regards / Met vriendelijke groeten, Niels Basjes