Narayanan, On Fri, Jul 1, 2011 at 12:57 PM, Narayanan K <knarayana...@gmail.com> wrote: > So the report will be run from a different machine outside the cluster. So > we need a way to pass on the parameters to the hadoop cluster (master) and > initiate a mapreduce job dynamically. Similarly the output of mapreduce job > needs to tunneled into the machine from where the report was run. > > Some more clarification I need is : Does the machine (outside of cluster) > which ran the report, require something like a Client installation which > will talk with the Hadoop Master Server via TCP??? Or can it can run a job > in hadoop server by using a passworldless scp to the master machine or > something of the like.
Regular way is to let the client talk to your nodes over tcp ports. This is what Hadoop's plain ol' submitter process does for you. Have you tried running any simple "hadoop jar <your jar>" from a remote client machine? If that works, so should invoking the same from your code (with appropriate configurations set) cause its basically the plain ol' runjar submission process in both ways. If not, maybe you need to think of opening ports to let things happen (if there's a firewall here). Hadoop does not use SSH/SCP to move code around. Please give this a read if you believe you're confused about how SSH+Hadoop is integrated (or not): http://wiki.apache.org/hadoop/FAQ#Does_Hadoop_require_SSH.3F -- Harsh J