Yes because the input path is something processed by the jobtracker and later the tasktrackers themselves, which won't be on your machine (necessarily).
Mappers can read the local file system but it's not clear what may or may not be there. Consider the distributed cache for smallish data. On Tue, Feb 12, 2013 at 7:05 PM, Dan Filimon <[email protected]>wrote: > When creating my own job driver, I'm unable to give it any inputs from > the local file system. An exception gets thrown when starting the job > (and trying to get the splits). > Apparently the files have to be on HDFS. > > Is there any way around this (ideally, I'd like it to first look for > the file on the local file system and if no file is found, look at > HDFS)? >
