RE: Starting a Hadoop job programtically

Henning Blohm Tue, 23 Nov 2010 00:24:22 -0800

Hi Praveen,

  in order to submit it to the cluster, you just need to have a
core-site.xml on your classpath (or load it explicitly into your
configuration object) that looks (at least) like this


<configuration>
        <property>
                <name>fs.default.name</name>
                <value>hdfs://${name:port of namenode}</value>
        </property>

        <property>
                <name>mapred.job.tracker</name>
                <value>${name:port of jobtracker}</value>
        </property> 
</configuration>

If you want to wait for each job's completion, you can use
job.waitForCompletion(true) rather than job.submit().

Good luck,
  henning


On Mon, 2010-11-22 at 23:40 +0100, praveen.pe...@nokia.com wrote:
> Hi Thanks for your reply. In my case I have a Driver that calls
> multiple jobs one after the other. I am using the following code to
> submit each job but it uses local hadoop jar files that is in the
> classpath. Its not submitting the job to Hadoop cluster. I thought I
> would need to specify where the master Hadoop is located on remote
> machine. Example command I use from command line is as follows but I
> need to do it from my Java program.
>  
> $ hadoop-0.20.2/bin/hadoop
> jar 
> /home/ppeddi/dev/Merchandising/RelevancyEngine/relevancy-core/dist/Relevancy4.jar
>  -i raw-downloads-input-10K -o reco-patterns-output-10K-1S -k 100 -method 
> mapreduce -g 500 -regex '[\ ]' -s 5
> 
> 
> I hope I made the question clear now.
>  
> Praveen
> 
> 
> ______________________________________________________________________
> From: ext Henning Blohm [mailto:henning.bl...@zfabrik.de] 
> Sent: Monday, November 22, 2010 5:07 PM
> To: mapreduce-user@hadoop.apache.org
> Subject: Re: Starting a Hadoop job programtically
> 
> 
> 
> 
> Hi Praveen,
> 
>   we do. We are using the "new" org.apache.hadoop.mapreduce.* API in
> Hadoop 0.20.2.
> 
>   Essentially the flow is:
> 
>   //----
>   // assuming all config is on the class path
>   Configuration config = new Configuration(); 
>   Job job = new Job(config, "some job name");
> 
>   // set in/out types
>   job.setInputFormatClass(...);
>   job.setOutputFormatClass(...);
>   job.setMapOutputKeyClass(...);
>   job.setMapOutputValueClass(...);
>   job.setOutputKeyClass(...);
>   job.setOutputValueClass(...);
> 
>   // set implementations as required
>   job.setMapperClass(<your mapper implementation class object>);
>   job.setCombinerClass(<your combiner implementation class object>);
>   job.setReducerClass(<your reducer implementation class object>);
> 
>   // set the jar... this is often the tricky part!
>   job.setJarByClass(<some class that is in the job jar and not
> elsewhere higher up on the class path>);
> 
>   job.submit();
>   //----
> 
> Hope I didn't forget anything.  
> 
> Note: You need to give Hadoop something it can launch in a JVM that
> has no more but the hadoop jars and whatever else you
> configured statically in your hadoop-env.sh script.
> 
> Can you describe your scenario in more detail?
> 
> Henning
> 
> 
> Am Montag, den 22.11.2010, 22:39 +0100 schrieb
> praveen.pe...@nokia.com: 
> 
> > Hi all, 
> > I am trying to figure how I can start a hadoop job porgramatically
> > from my Java application running in an app server. I was able to run
> > my map reduce job using hadoop command from hadoop master machine
> > but my goal is to run the same job from my java program (running on
> > a different machine than master). I googled and could not find
> > solution for this. All the examples I have seen so far are using
> > hadoop from command line to start a job. 
> > 1. Has anyone called Hadoop job invocation from a Java application? 
> > 2. If so, could someone provide some sample code. 
> > 3. 
> > Thanks 
> > Praveen 
> > 
> 
> Henning Blohm
> 
> ZFabrik Software KG
> 
> henning.bl...@zfabrik.de
> www.z2-environment.eu
> 
> 
>

RE: Starting a Hadoop job programtically

Reply via email to