Hi, I am trying to figure out if Hadoop can be used for one functionality that I am trying to develop. I have large volumes of data already stored on disks that are locally/remotely mounted on many linux machines. I have to do some data analysis over that data. Since the data is huge, I would like to parallel process that data and then combine the results. The MapReduce functionality of Hadoop fits well with my scenario. But I do not want to create HDFS as I already have the data available on all the machine and I do not want to again transfer the data to the new file system. Is it possible to skip HDFS but use the MapReduce functionality? Any idea what would have to be done?
Thanks, Neeraj