Hi Praveen, On Tue, 2010-11-23 at 17:18 +0100, praveen.pe...@nokia.com wrote: > Hi Henning, > adding hadoop's conf folder didn't help fixing the issue but when I > added the two below properties, I was able to access file system but > cannot write anything due to different user. I have following > questions based on experiments.
Exaclty. I didn't mean to add the whole folder. Just the one file with those props. > > 1. How can I access HDFS or submit jobs as different user than my java > app is running. For example, Hadoop cluster is setup for "hadoop" user > and my java app is runnign as different user. In order to run the job > correctly, I have to submit it as "hadoop" user. correct? How to > achive it programitcally? We always run everything with the same user (now that you mention it). Didn't know that we would have a problem otherwise. I would have suspected that the submitting user doesn't matter (setting the corresponding system property would probably override that one anyway). > 2. Few of the jobs I am calling is provided by the library which means > I cannot add these two config properties myself. Is there any way > around this other than replicating the job submission code from the > library to locally? Yes, I think creating a core-site.xml file as below, putting it into <folder> (any folder you like will do) and adding <folder> to your classpath when submitting should do the trick (as I tried to explain before and if I am not mistaken). > > Thanks > Praveen Good luck, Henning > > > ______________________________________________________________________ > From: ext Henning Blohm [mailto:henning.bl...@zfabrik.de] > Sent: Tuesday, November 23, 2010 3:24 AM > To: mapreduce-user@hadoop.apache.org > Subject: RE: Starting a Hadoop job programtically > > > > > Hi Praveen, > > in order to submit it to the cluster, you just need to have a > core-site.xml on your classpath (or load it explicitly into your > configuration object) that looks (at least) like this > > <configuration> > <property> > <name>fs.default.name</name> > <value>hdfs://${name:port of namenode}</value> > </property> > > <property> > <name>mapred.job.tracker</name> > <value>${name:port of jobtracker}</value> > </property> > </configuration> > > If you want to wait for each job's completion, you can use > job.waitForCompletion(true) rather than job.submit(). > > Good luck, > henning > > > On Mon, 2010-11-22 at 23:40 +0100, praveen.pe...@nokia.com wrote: > > > Hi Thanks for your reply. In my case I have a Driver that calls > > multiple jobs one after the other. I am using the following code to > > submit each job but it uses local hadoop jar files that is in the > > classpath. Its not submitting the job to Hadoop cluster. I thought I > > would need to specify where the master Hadoop is located on remote > > machine. Example command I use from command line is as follows but I > > need to do it from my Java program. > > $ hadoop-0.20.2/bin/hadoop > > jar > > /home/ppeddi/dev/Merchandising/RelevancyEngine/relevancy-core/dist/Relevancy4.jar > > -i raw-downloads-input-10K -o reco-patterns-output-10K-1S -k 100 -method > > mapreduce -g 500 -regex '[\ ]' -s 5 > > > > > > I hope I made the question clear now. > > Praveen > > > > > > ____________________________________________________________________ > > > > From: ext Henning Blohm [mailto:henning.bl...@zfabrik.de] > > Sent: Monday, November 22, 2010 5:07 PM > > To: mapreduce-user@hadoop.apache.org > > Subject: Re: Starting a Hadoop job programtically > > > > > > > > Hi Praveen, > > > > we do. We are using the "new" org.apache.hadoop.mapreduce.* API in > > Hadoop 0.20.2. > > > > Essentially the flow is: > > > > //---- > > // assuming all config is on the class path > > Configuration config = new Configuration(); > > Job job = new Job(config, "some job name"); > > > > // set in/out types > > job.setInputFormatClass(...); > > job.setOutputFormatClass(...); > > job.setMapOutputKeyClass(...); > > job.setMapOutputValueClass(...); > > job.setOutputKeyClass(...); > > job.setOutputValueClass(...); > > > > // set implementations as required > > job.setMapperClass(<your mapper implementation class object>); > > job.setCombinerClass(<your combiner implementation class object>); > > job.setReducerClass(<your reducer implementation class object>); > > > > // set the jar... this is often the tricky part! > > job.setJarByClass(<some class that is in the job jar and not > > elsewhere higher up on the class path>); > > > > job.submit(); > > //---- > > > > Hope I didn't forget anything. > > > > Note: You need to give Hadoop something it can launch in a JVM that > > has no more but the hadoop jars and whatever else you > > configured statically in your hadoop-env.sh script. > > > > Can you describe your scenario in more detail? > > > > Henning > > > > > > Am Montag, den 22.11.2010, 22:39 +0100 schrieb > > praveen.pe...@nokia.com: > > > > > Hi all, > > > I am trying to figure how I can start a hadoop job porgramatically > > > from my Java application running in an app server. I was able to > > > run my map reduce job using hadoop command from hadoop master > > > machine but my goal is to run the same job from my java program > > > (running on a different machine than master). I googled and could > > > not find solution for this. All the examples I have seen so far > > > are using hadoop from command line to start a job. > > > 1. Has anyone called Hadoop job invocation from a Java > > > application? > > > 2. If so, could someone provide some sample code. > > > 3. > > > Thanks > > > Praveen > > > > > > > Henning Blohm > > > > ZFabrik Software KG > > > > henning.bl...@zfabrik.de > > www.z2-environment.eu > > > > > > > > > >