An alternative will be to use hadoop fs apis to recursively list file status and pass that as the input files . This is slightly complicated but will give you more control and might help while debugging as well. Just a thought.
Thanks, Amogh -----Original Message----- From: Amandeep Khurana [mailto:[email protected]] Sent: Saturday, September 12, 2009 3:03 AM To: [email protected] Subject: Re: Hadoop Input Files Directory You can give something like /path/to/directories/*/*/* On Fri, Sep 11, 2009 at 2:10 PM, Boyu Zhang <[email protected]> wrote: > Dear All, > > > > I have an input directories of depth 3, the actual files are in the deepest > levels. (something like /data/user/dir_0/file0 , /data/user/dir_1/file0, > /data/user/dir_2/file0) And I want to write a mapreduce job to process > these > files in the deepest levels. > > > > One way of doing so is to specify the input path to the directories that > contain the files, like /data/user/dir_0, /data/user/dir_1, > /data/user/dir_2. But this way is not feasible when I have much more > directories as I will. I tried to specify the input path as /data/user, but > I get error of cannot open filename /data/user/dir_0. > > > > My question is that is there any way that I can process all the files in a > hierarchy with the input path set to the top level? > > > > Thanks a lot for the time! > > > > Boyu Zhang > > University of Delaware > >
