Re: execute mapreduce job on multiple hdfs files

Amogh Vasekar Tue, 23 Mar 2010 08:39:06 -0700

Hi,
Piggybacking on Gang’s reply, to add files / dirs recursively you can use the 
filestatus, liststatus to determine if its a file or dir and add as needed ( 
check FileStatus API for this ) There is a patch which does this for 
FileInputFormat


http://issues.apache.org/jira/browse/MAPREDUCE-1501


Amogh


On 3/23/10 6:25 PM, "Gang Luo" <[email protected]> wrote:

Hi Oleg,
you can use FileInputFormat.addInputPath(JobConf, Path) multiple times in your 
program to add arbitrary paths. Instead, if you use 
FileInputFormat.setInputPath, there could be only one input path.

If you are talking about output, the path you give is an output directory, all 
the output files (part-00000, part-00001...) will be generated in that 
directory.

-Gang




----- 原始邮件 ----
发件人： Oleg Ruchovets <[email protected]>
收件人： [email protected]
发送日期： 2010/3/23 (周二) 6:18:34 上午
主   题： execute mapreduce job on multiple hdfs files

Hi ,
All examples that I found executes mapreduce job on a single file but in my
situation I have more than one.

Suppose I have such folder on HDFS which contains some files:

    /my_hadoop_hdfs/my_folder:
                /my_hadoop_hdfs/my_folder/file1.txt
                /my_hadoop_hdfs/my_folder/file2.txt
                /my_hadoop_hdfs/my_folder/file3.txt


How can I execute  hadoop mapreduce on file1.txt , file2.txt and file3.txt?

Is it possible to provide to hadoop job folder as parameter and all files
will be produced by mapreduce job?

Thanks In Advance
Oleg

Re: execute mapreduce job on multiple hdfs files

Reply via email to