RE: Starting a Hadoop job programtically

Henning Blohm Tue, 23 Nov 2010 08:37:31 -0800

Hi Praveen,

On Tue, 2010-11-23 at 17:18 +0100, praveen.pe...@nokia.com wrote:
> Hi Henning,
> adding hadoop's conf folder didn't help fixing the issue but when I
> added the two below properties, I was able to access file system but
> cannot write anything due to different user. I have following
> questions based on experiments.



Exaclty. I didn't mean to add the whole folder. Just the one file with
those props.

>  
> 1. How can I access HDFS or submit jobs as different user than my java
> app is running. For example, Hadoop cluster is setup for "hadoop" user
> and my java app is runnign as different user. In order to run the job
> correctly, I have to submit it as "hadoop" user. correct? How to
> achive it programitcally?


We always run everything with the same user (now that you mention it).
Didn't know that we would have a problem otherwise. I would have
suspected that the submitting user doesn't matter (setting the
corresponding system property would probably override that one anyway).

> 2. Few of the jobs I am calling is provided by the library which means
> I cannot add these two config properties myself. Is there any way
> around this other than replicating the job submission code from the
> library to locally?


Yes, I think creating a core-site.xml file as below, putting it into
<folder> (any folder you like will do) and adding <folder> to your
classpath when submitting should do the trick (as I tried to explain
before and if I am not mistaken).

>  
> Thanks
> Praveen


Good luck,
  Henning


> 
> 
> ______________________________________________________________________
> From: ext Henning Blohm [mailto:henning.bl...@zfabrik.de] 
> Sent: Tuesday, November 23, 2010 3:24 AM
> To: mapreduce-user@hadoop.apache.org
> Subject: RE: Starting a Hadoop job programtically
> 
> 
> 
> 
> Hi Praveen,
> 
>   in order to submit it to the cluster, you just need to have a
> core-site.xml on your classpath (or load it explicitly into your
> configuration object) that looks (at least) like this
> 
> <configuration>
> <property>
> <name>fs.default.name</name>
> <value>hdfs://${name:port of namenode}</value>
> </property>
> 
> <property>
> <name>mapred.job.tracker</name>
> <value>${name:port of jobtracker}</value>
> </property> 
> </configuration>
> 
> If you want to wait for each job's completion, you can use
> job.waitForCompletion(true) rather than job.submit().
> 
> Good luck,
>   henning
> 
> 
> On Mon, 2010-11-22 at 23:40 +0100, praveen.pe...@nokia.com wrote: 
> 
> > Hi Thanks for your reply. In my case I have a Driver that calls
> > multiple jobs one after the other. I am using the following code to
> > submit each job but it uses local hadoop jar files that is in the
> > classpath. Its not submitting the job to Hadoop cluster. I thought I
> > would need to specify where the master Hadoop is located on remote
> > machine. Example command I use from command line is as follows but I
> > need to do it from my Java program. 
> > $ hadoop-0.20.2/bin/hadoop
> > jar 
> > /home/ppeddi/dev/Merchandising/RelevancyEngine/relevancy-core/dist/Relevancy4.jar
> >  -i raw-downloads-input-10K -o reco-patterns-output-10K-1S -k 100 -method 
> > mapreduce -g 500 -regex '[\ ]' -s 5
> > 
> > 
> > I hope I made the question clear now. 
> > Praveen
> > 
> > 
> > ____________________________________________________________________
> > 
> > From: ext Henning Blohm [mailto:henning.bl...@zfabrik.de] 
> > Sent: Monday, November 22, 2010 5:07 PM
> > To: mapreduce-user@hadoop.apache.org
> > Subject: Re: Starting a Hadoop job programtically
> > 
> > 
> > 
> > Hi Praveen,
> > 
> >   we do. We are using the "new" org.apache.hadoop.mapreduce.* API in
> > Hadoop 0.20.2.
> > 
> >   Essentially the flow is:
> > 
> >   //----
> >   // assuming all config is on the class path
> >   Configuration config = new Configuration(); 
> >   Job job = new Job(config, "some job name");
> > 
> >   // set in/out types
> >   job.setInputFormatClass(...);
> >   job.setOutputFormatClass(...);
> >   job.setMapOutputKeyClass(...);
> >   job.setMapOutputValueClass(...);
> >   job.setOutputKeyClass(...);
> >   job.setOutputValueClass(...);
> > 
> >   // set implementations as required
> >   job.setMapperClass(<your mapper implementation class object>);
> >   job.setCombinerClass(<your combiner implementation class object>);
> >   job.setReducerClass(<your reducer implementation class object>);
> > 
> >   // set the jar... this is often the tricky part!
> >   job.setJarByClass(<some class that is in the job jar and not
> > elsewhere higher up on the class path>);
> > 
> >   job.submit();
> >   //----
> > 
> > Hope I didn't forget anything.  
> > 
> > Note: You need to give Hadoop something it can launch in a JVM that
> > has no more but the hadoop jars and whatever else you
> > configured statically in your hadoop-env.sh script.
> > 
> > Can you describe your scenario in more detail?
> > 
> > Henning
> > 
> > 
> > Am Montag, den 22.11.2010, 22:39 +0100 schrieb
> > praveen.pe...@nokia.com: 
> > 
> > > Hi all, 
> > > I am trying to figure how I can start a hadoop job porgramatically
> > > from my Java application running in an app server. I was able to
> > > run my map reduce job using hadoop command from hadoop master
> > > machine but my goal is to run the same job from my java program
> > > (running on a different machine than master). I googled and could
> > > not find solution for this. All the examples I have seen so far
> > > are using hadoop from command line to start a job. 
> > > 1. Has anyone called Hadoop job invocation from a Java
> > > application? 
> > > 2. If so, could someone provide some sample code. 
> > > 3. 
> > > Thanks 
> > > Praveen 
> > > 
> > 
> > Henning Blohm
> > 
> > ZFabrik Software KG
> > 
> > henning.bl...@zfabrik.de
> > www.z2-environment.eu
> > 
> > 
> > 
> > 
> 
>

RE: Starting a Hadoop job programtically

Reply via email to