Re: running hadoop remotely from inside a java program

Steve Loughran Wed, 25 Jun 2008 07:09:54 -0700

Deyaa Adranale wrote:

hello,
i am developing a tool that will do some analysis tasks using hadoopmap/reduce on a cluster
the tool user interfaces will be run on the client windows system andshould run the analysis tasks as map/reduce jobs on a hadoop cluster(configured by the user).
my question is how to run hadoop jobs on a cluster from a client machine(other than the master) from inside java code.I know that I should have a hadoop installation on the client thatshould be configured to point to the cluster's master, but I am not surehow to do it.

you need the hadoop JARs; your client can then talk directly to acluster provided

 * it is not too far away, network-wise
 * the client hadoop configuration is in sync with the servers

You just create a JobClient instance and submit a job through it

another necessity for my tool would be to copy files from the localclient file system to the HDFS on the cluster. I am also not sure if Ican access the HDFS of the cluster from a client machine using java code.


Yes, look in the FsShell and FileUtils classes

* None of this stuff is documented outside the source+javadocs, so youwill need to rummage around the source to work out what to do.* Pull log4J.properties and commons-logging.properties from the hadoopJARs if you want to route the hadoop classes logging through your ownchosen logger

Re: running hadoop remotely from inside a java program

Reply via email to