You should be able to add fully qualified HDFS paths from N clusters into the same job via FileInputFormat.addInputPath(…) calls. Caveats may apply for secure environments, but for non-secure mode this should work just fine. Did you try this and did it not work?
On Mon, Apr 8, 2013 at 9:56 PM, Pedro Sá da Costa <psdc1...@gmail.com> wrote: > Hi, > > I want to combine the data that are in different HDFS filesystems, for them > to be executed in one job. Is it possible to do this with MR, or there is > another Apache tool that allows me to do this? > > Eg. > > Hdfs data in Cluster1 ----v > Hdfs data in Cluster2 -> this job reads the data from Cluster1, 2 > > > Thanks, > -- > Best regards, -- Harsh J