Phantom wrote:
If my Map job is going to process a file does it have to be in HDFS
No, but they usually are. Job inputs are resolved relative to the default filesystem. So, if you've configured the default filesystem to be HDFS, and you pass a filename that's not qualified by a filesystem as the input to your job, then your input should be in HDFS.
But inputs don't have to be in the default filesystem nor must they be in HDFS. They need to be in a filesystem that's available to all nodes. They could be in NFS, S3, or Ceph instead of HDFS. They could even be in a non-default HDFS system.
and if so how do I get it there ?
If HDFS is configured as your default filesystem: bin/hadoop fs -put localFileName nameInHdfs Doug
