Deyaa Adranale wrote:
hello,
i am developing a tool that will do some analysis tasks using hadoop
map/reduce on a cluster
the tool user interfaces will be run on the client windows system and
should run the analysis tasks as map/reduce jobs on a hadoop cluster
(configured by the user).
my question is how to run hadoop jobs on a cluster from a client machine
(other than the master) from inside java code.
I know that I should have a hadoop installation on the client that
should be configured to point to the cluster's master, but I am not sure
how to do it.
you need the hadoop JARs; your client can then talk directly to a
cluster provided
* it is not too far away, network-wise
* the client hadoop configuration is in sync with the servers
You just create a JobClient instance and submit a job through it
another necessity for my tool would be to copy files from the local
client file system to the HDFS on the cluster. I am also not sure if I
can access the HDFS of the cluster from a client machine using java code.
Yes, look in the FsShell and FileUtils classes
* None of this stuff is documented outside the source+javadocs, so you
will need to rummage around the source to work out what to do.
* Pull log4J.properties and commons-logging.properties from the hadoop
JARs if you want to route the hadoop classes logging through your own
chosen logger