RE: Starting a Hadoop job programtically

Henning Blohm Wed, 24 Nov 2010 00:38:07 -0800

Hi Praveen,

looking at the Job configuration you will find properties like user.name
and more stuff that has created by substituting template values in
core-default.xml, mapred-default.xml (all in the hadoop jars). I suppose
on of these (if not user.name) define the user that submits. But I
haven't tried and I am sure others know better.


Why is that actually important? Why not submit as the user you are?

About submitting multiple jars: AFAIK the standard way is to submit
everything in one jar.

Henning

ps.: We are developing something based on www.z2-environment.eu that
will complement Hadoop with automatic on-demand update on the task node.
But it's not public yet.


On Wed, 2010-11-24 at 00:10 +0100, praveen.pe...@nokia.com wrote:
> Hi Henning,
> Putting core-site.xml in classpath worked. Thanks for the help. I need
> to figure how to submit a job as a different user than the user hadoop
> is configured for.
>  
> I have one more related to job submission. Did anyone face problem
> with running job that involves multiple jar files. I am running a map
> reduce job that references multiple jar files. When I run the job I
> always get ClassNotFoundException on the class that is not in the jar
> file that job class is present.
>  
> I am starting the jobs from a java application and am getting
> ClassNotFoundException.
>  
> java.lang.RuntimeException: java.lang.ClassNotFoundException:
> com.nokia.relevancy.util.hadoop.ValueOnlyTextOutputFormat
>         at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
>         at
> org.apache.hadoop.mapreduce.JobContext.getOutputFormatClass(JobContext.java:193)
>         at org.apache.hadoop.mapred.Task.initialize(Task.java:413)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:288)
>         at org.apache.hadoop.mapred.Child.main(Child.java:170)
> Caused by: java.lang.ClassNotFoundException:
> com.nokia.relevancy.util.hadoop.ValueOnlyTextOutputFormat
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>         at sun.misc.Launcher
> $AppClassLoader.loadClass(Launcher.java:301)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:247)
>         at
> org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
>         at
> org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
>         ... 4 more
> 
> Praveen 
>  
> 
> ______________________________________________________________________
> From: ext Henning Blohm [mailto:henning.bl...@zfabrik.de] 
> Sent: Tuesday, November 23, 2010 11:37 AM
> To: mapreduce-user@hadoop.apache.org
> Subject: RE: Starting a Hadoop job programtically
> 
> 
> 
> 
> Hi Praveen,
> 
> On Tue, 2010-11-23 at 17:18 +0100, praveen.pe...@nokia.com wrote: 
> 
> > Hi Henning, 
> > adding hadoop's conf folder didn't help fixing the issue but when I
> > added the two below properties, I was able to access file system but
> > cannot write anything due to different user. I have following
> > questions based on experiments.
> 
> 
> Exaclty. I didn't mean to add the whole folder. Just the one file with
> those props.
> 
> 
> > 1. How can I access HDFS or submit jobs as different user than my
> > java app is running. For example, Hadoop cluster is setup for
> > "hadoop" user and my java app is runnign as different user. In order
> > to run the job correctly, I have to submit it as "hadoop" user.
> > correct? How to achive it programitcally?
> 
> 
> We always run everything with the same user (now that you mention it).
> Didn't know that we would have a problem otherwise. I would have
> suspected that the submitting user doesn't matter (setting the
> corresponding system property would probably override that one
> anyway).
> 
> 
> > 2. Few of the jobs I am calling is provided by the library which
> > means I cannot add these two config properties myself. Is there any
> > way around this other than replicating the job submission code from
> > the library to locally?
> 
> 
> Yes, I think creating a core-site.xml file as below, putting it into
> <folder> (any folder you like will do) and adding <folder> to your
> classpath when submitting should do the trick (as I tried to explain
> before and if I am not mistaken).
> 
> 
> > Thanks 
> > Praveen
> 
> 
> Good luck,
>   Henning
> 
> 
> > 
> > 
> > ____________________________________________________________________
> > 
> > From: ext Henning Blohm [mailto:henning.bl...@zfabrik.de] 
> > Sent: Tuesday, November 23, 2010 3:24 AM
> > To: mapreduce-user@hadoop.apache.org
> > Subject: RE: Starting a Hadoop job programtically
> > 
> > 
> > 
> > Hi Praveen,
> > 
> >   in order to submit it to the cluster, you just need to have a
> > core-site.xml on your classpath (or load it explicitly into your
> > configuration object) that looks (at least) like this
> > 
> > <configuration>
> > <property>
> > <name>fs.default.name</name>
> > <value>hdfs://${name:port of namenode}</value>
> > </property>
> > 
> > <property>
> > <name>mapred.job.tracker</name>
> > <value>${name:port of jobtracker}</value>
> > </property> 
> > </configuration>
> > 
> > If you want to wait for each job's completion, you can use
> > job.waitForCompletion(true) rather than job.submit().
> > 
> > Good luck,
> >   henning
> > 
> > 
> > On Mon, 2010-11-22 at 23:40 +0100, praveen.pe...@nokia.com wrote: 
> > 
> > > Hi Thanks for your reply. In my case I have a Driver that calls
> > > multiple jobs one after the other. I am using the following code
> > > to submit each job but it uses local hadoop jar files that is in
> > > the classpath. Its not submitting the job to Hadoop cluster. I
> > > thought I would need to specify where the master Hadoop is located
> > > on remote machine. Example command I use from command line is as
> > > follows but I need to do it from my Java program. 
> > > $ hadoop-0.20.2/bin/hadoop
> > > jar 
> > > /home/ppeddi/dev/Merchandising/RelevancyEngine/relevancy-core/dist/Relevancy4.jar
> > >  -i raw-downloads-input-10K -o reco-patterns-output-10K-1S -k 100 -method 
> > > mapreduce -g 500 -regex '[\ ]' -s 5
> > > 
> > > 
> > > I hope I made the question clear now. 
> > > Praveen
> > > 
> > > 
> > > __________________________________________________________________
> > > 
> > > 
> > > From: ext Henning Blohm [mailto:henning.bl...@zfabrik.de] 
> > > Sent: Monday, November 22, 2010 5:07 PM
> > > To: mapreduce-user@hadoop.apache.org
> > > Subject: Re: Starting a Hadoop job programtically
> > > 
> > > 
> > > 
> > > Hi Praveen,
> > > 
> > >   we do. We are using the "new" org.apache.hadoop.mapreduce.* API
> > > in Hadoop 0.20.2.
> > > 
> > >   Essentially the flow is:
> > > 
> > >   //----
> > >   // assuming all config is on the class path
> > >   Configuration config = new Configuration(); 
> > >   Job job = new Job(config, "some job name");
> > > 
> > >   // set in/out types
> > >   job.setInputFormatClass(...);
> > >   job.setOutputFormatClass(...);
> > >   job.setMapOutputKeyClass(...);
> > >   job.setMapOutputValueClass(...);
> > >   job.setOutputKeyClass(...);
> > >   job.setOutputValueClass(...);
> > > 
> > >   // set implementations as required
> > >   job.setMapperClass(<your mapper implementation class object>);
> > >   job.setCombinerClass(<your combiner implementation class
> > > object>);
> > >   job.setReducerClass(<your reducer implementation class object>);
> > > 
> > >   // set the jar... this is often the tricky part!
> > >   job.setJarByClass(<some class that is in the job jar and not
> > > elsewhere higher up on the class path>);
> > > 
> > >   job.submit();
> > >   //----
> > > 
> > > Hope I didn't forget anything.  
> > > 
> > > Note: You need to give Hadoop something it can launch in a JVM
> > > that has no more but the hadoop jars and whatever else you
> > > configured statically in your hadoop-env.sh script.
> > > 
> > > Can you describe your scenario in more detail?
> > > 
> > > Henning
> > > 
> > > 
> > > Am Montag, den 22.11.2010, 22:39 +0100 schrieb
> > > praveen.pe...@nokia.com: 
> > > 
> > > > Hi all, 
> > > > I am trying to figure how I can start a hadoop job
> > > > porgramatically from my Java application running in an app
> > > > server. I was able to run my map reduce job using hadoop command
> > > > from hadoop master machine but my goal is to run the same job
> > > > from my java program (running on a different machine than
> > > > master). I googled and could not find solution for this. All the
> > > > examples I have seen so far are using hadoop from command line
> > > > to start a job. 
> > > > 1. Has anyone called Hadoop job invocation from a Java
> > > > application? 
> > > > 2. If so, could someone provide some sample code. 
> > > > 3. 
> > > > Thanks 
> > > > Praveen 
> > > > 
> > > 
> > > Henning Blohm
> > > 
> > > ZFabrik Software KG
> > > 
> > > henning.bl...@zfabrik.de
> > > www.z2-environment.eu
> > > 
> > > 
> > > 
> > > 
> > > 
> > 
> > 
> 
>

RE: Starting a Hadoop job programtically

Reply via email to