Hi, This question refers to a thread that was asked back in June. http://www.mail-archive.com/core-u...@hadoop.apache.org/msg10490.html
I would like to do a similar thing. I have logs in a similar format to: /logs/<hostname>/<date>.log and I would like to selectively choose which logs to process in a date range. First I tried the approach suggested by Brian, writing a subroutine in the driver to descend through the file system starting at /logs and builds a list of input files. http://www.mail-archive.com/core-u...@hadoop.apache.org/msg10492.html This approach did not work for me when I tried to use inputs from s3. It kept complaining about java.lang.IllegalArgumentException: Wrong FS. Then I tried the second approach that was suggested by writing a custom InputFormat that recursively traverses directories for files. This approach worked for S3 inputs. But I would like to pass two dates to my InputFormat so that it can use them as a date range to filter out files. I got stuck here because I couldn't figure out how to pass date parameters to the InputFormat. In my driver, I set the Inputformat as follows: conf.setInputFormat(FilterFileTextInputFormat.class); Any ideas on how I can get either approach to work? thanks, David