Neeraj Mahajan wrote:
But I do not want to create HDFS as I already have the data available on
all
the machine and I do not want to again transfer the data to the new file
system. Is it possible to skip HDFS but use the MapReduce functionality?
Any
idea what would have to be done?
Hadoop requires that input paths are universal across nodes. So if you
have data that is accessible from all nodes through the local filesystem
(either by copying it there or via nfs mounts) then, so long as it is
accessible through the same path on all nodes, Hadoop should work fine:
the data named by file:///my_data/foo/bar should be the same on all hosts.
That said, accessing data over NFS will probably be slower than over
HDFS. If the data resides on only a small subset of your nodes then
these nodes could become overloaded. As a general rule, if you're going
to touch the data more than once, and have room, it would probably be a
good idea to copy it into an HDFS filesystem.
Doug