Narayanan, Regarding the client installation, you should make sure that client and server use same version hadoop for submitting jobs and transfer data. if you use a different user in client than the one runs hadoop job, config the hadoop ugi property (sorry i forget the exact name).
在 2011 7 1 15:28,"Narayanan K" <knarayana...@gmail.com>写道: > Hi Harsh > > Thanks for the quick response... > > Have a few clarifications regarding the 1st point : > > Let me tell the background first.. > > We have actually set up a Hadoop cluster with HBase installed. We are > planning to load Hbase with data and perform some > computations with the data and show up the data in a report format. > The report should be accessible from outside the cluster and the report > accepts certain parameters to show data, that will in turn pass on these > parameters to the hadoop master server where a mapreduce job will be run > that queries HBase to retrieve the data. > > So the report will be run from a different machine outside the cluster. So > we need a way to pass on the parameters to the hadoop cluster (master) and > initiate a mapreduce job dynamically. Similarly the output of mapreduce job > needs to tunneled into the machine from where the report was run. > > Some more clarification I need is : Does the machine (outside of cluster) > which ran the report, require something like a Client installation which > will talk with the Hadoop Master Server via TCP??? Or can it can run a job > in hadoop server by using a passworldless scp to the master machine or > something of the like. > > > Regards, > Narayanan > > > > > On Fri, Jul 1, 2011 at 11:41 AM, Harsh J <ha...@cloudera.com> wrote: > >> Narayanan, >> >> >> On Fri, Jul 1, 2011 at 11:28 AM, Narayanan K <knarayana...@gmail.com> >> wrote: >> > Hi all, >> > >> > We are basically working on a research project and I require some help >> > regarding this. >> >> Always glad to see research work being done! What're you working on? :) >> >> > How do I submit a mapreduce job from outside the cluster i.e from a >> > different machine outside the Hadoop cluster? >> >> If you use Java APIs, use the Job#submit(…) method and/or >> JobClient.runJob(…) method. >> Basically Hadoop will try to create a jar with all requisite classes >> within and will push it out to the JobTracker's filesystem (HDFS, if >> you run HDFS). From there on, its like a regular operation. >> >> This even happens on the Hadoop nodes itself, so doing so from an >> external place as long as that place has access to Hadoop's JT and >> HDFS, should be no different at all. >> >> If you are packing custom libraries along, don't forget to use >> DistributedCache. If you are packing custom MR Java code, don't forget >> to use Job#setJarByClass/JobClient#setJarByClass and other appropriate >> API methods. >> >> > If the above can be done, How can I schedule map reduce jobs to run in >> > hadoop like crontab from a different machine? >> > Are there any webservice APIs that I can leverage to access a hadoop >> cluster >> > from outside and submit jobs or read/write data from HDFS. >> >> For scheduling jobs, have a look at Oozie: http://yahoo.github.com/oozie/ >> It is well supported and is very useful in writing MR workflows (which >> is a common requirement). You also get coordinator features and can >> schedule similar to crontab functionalities. >> >> For HDFS r/w over web, not sure of an existing web app specifically >> for this purpose without limitations, but there is a contrib/thriftfs >> you can leverage upon (if not writing your own webserver in Java, in >> which case its as simple as using HDFS APIs). >> >> Also have a look at the pretty mature Hue project which aims to >> provide a great frontend that lets you design jobs, submit jobs, >> monitor jobs and upload files or browse the filesystem (among several >> other things): http://cloudera.github.com/hue/ >> >> -- >> Harsh J >>