Re: Combine data from different HDFS FS

David Rosenstrauch Tue, 09 Apr 2013 06:38:28 -0700

I don't think you need a special input format. I think you just need tospecify your list of input files like this:


hdfs://HOST1/folder-name/file-name,hdfs://HOST2/folder-name/file-name, ...


HTH,

DR

On 04/09/2013 12:07 AM, Pedro Sá da Costa wrote:

Maybe there is some FileInputFormat class that allows to define input files
from different locations. What I would like to know, is if it's possible to
read input data from different HDFS FS. E.g., run the wordcount with the
input files from HDFS FS in HOST1 and HOST2 (the FS in HOST1 and HOST2 are
distinct). Any suggestion on which InputFormat I should use?



On 9 April 2013 00:10, Pedro Sá da Costa <psdc1...@gmail.com> wrote:

I'm invoking the wordcount example in host1 with this command, but I got
an error.


HOST1:$ bin/hadoop jar hadoop-examples-1.0.4.jar wordcount
hdfs://HOST2:54310/gutenberg gutenberg-output

13/04/08 22:02:55 ERROR security.UserGroupInformation:
PriviledgedActionException as:ubuntu
cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input
path does not exist: hdfs://HOST2:54310/gutenberg
org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path
does not exist: hdfs://HOST2:54310/gutenberg

Can you be more specific about using the FileinputFormat? It's because
I've configured MapReduce and HDFS to work in HOST, and I don't know how
can I make an wordcount that reads the data from the HDFS from files in
HOST1 and HOST2?






On 8 April 2013 19:34, Harsh J <ha...@cloudera.com> wrote:

You should be able to add fully qualified HDFS paths from N clusters
into the same job via FileInputFormat.addInputPath(…) calls. Caveats
may apply for secure environments, but for non-secure mode this should
work just fine. Did you try this and did it not work?

On Mon, Apr 8, 2013 at 9:56 PM, Pedro Sá da Costa <psdc1...@gmail.com>
wrote:

Hi,

I want to combine the data that are in different HDFS filesystems, for

them

to be executed in one job. Is it possible to do this with MR, or there

is

another Apache tool that allows me to do this?

Eg.

Hdfs data in Cluster1 ----v
Hdfs data in Cluster2 -> this job reads the data from Cluster1, 2


Thanks,
--
Best regards,




--
Harsh J




--
Best regards,

Re: Combine data from different HDFS FS

Reply via email to