I see. Well, my use case was wanting to run the job on one machine, being lazy and not wanting to put the files on HDFS. :)
On Tue, Feb 12, 2013 at 8:27 PM, Sean Owen <[email protected]> wrote: > Yes because the input path is something processed by the jobtracker and > later the tasktrackers themselves, which won't be on your machine > (necessarily). > > Mappers can read the local file system but it's not clear what may or may > not be there. Consider the distributed cache for smallish data. > > > On Tue, Feb 12, 2013 at 7:05 PM, Dan Filimon > <[email protected]>wrote: > >> When creating my own job driver, I'm unable to give it any inputs from >> the local file system. An exception gets thrown when starting the job >> (and trying to get the splits). >> Apparently the files have to be on HDFS. >> >> Is there any way around this (ideally, I'd like it to first look for >> the file on the local file system and if no file is found, look at >> HDFS)? >>
