RE: Hadoop Input Files Directory

Amogh Vasekar Sun, 13 Sep 2009 21:56:10 -0700

An alternative will be to use hadoop fs apis to recursively list file status 
and pass that as the input files . This is slightly complicated but will give 
you more control and might help while debugging as well.
Just a thought.


Thanks,
Amogh

-----Original Message-----
From: Amandeep Khurana [mailto:[email protected]] 
Sent: Saturday, September 12, 2009 3:03 AM
To: [email protected]
Subject: Re: Hadoop Input Files Directory

You can give something like /path/to/directories/*/*/*


On Fri, Sep 11, 2009 at 2:10 PM, Boyu Zhang <[email protected]> wrote:

> Dear All,
>
>
>
> I have an input directories of depth 3, the actual files are in the deepest
> levels. (something like /data/user/dir_0/file0 , /data/user/dir_1/file0,
> /data/user/dir_2/file0) And I want to write a mapreduce job to process
> these
> files in the deepest levels.
>
>
>
> One way of doing so is to specify the input path to the directories that
> contain the files, like /data/user/dir_0, /data/user/dir_1,
> /data/user/dir_2. But this way is not feasible when I have much more
> directories as I will. I tried to specify the input path as /data/user, but
> I get error of cannot open filename /data/user/dir_0.
>
>
>
> My question is that is there any way that I can process all the files in a
> hierarchy with the input path set to the top level?
>
>
>
> Thanks a lot for the time!
>
>
>
> Boyu Zhang
>
> University of Delaware
>
>

RE: Hadoop Input Files Directory

Reply via email to