Hello David,
Thanks for your suggestions. I fail to see where your approach is
different from the one used in the tutorial. The -libjars option is a
command line option of the Hadoop executable. I do not want to call that
executable. Maybe I don't see the point. My implementation is basically
the same as your template. And using the Hadoop executable with my main
jar and the additional jars loaded by -libjars works fine.
Regards,
Martin
On 24.09.2010 17:29, David Rosenstrauch wrote:
On 09/24/2010 11:12 AM, Martin Becker wrote:
Hi James,
I am trying to avoid to call any command line command. I want to submit
a job from within a java application. If possible without packing any
jar file at all. But I guess that will be necessary to allow Hadoop to
load the specific classes. The tutorial definitely does not contain any
explicit java code how to do this. Sorry, for not stating my problem
clearly:
Right now I want to use Eclipse to submit my job by doing using the "Run
as..." dialog. Later I want to embed that part in a java application
submitting configured jobs to a remote Hadoop system/cluster.
Regards,
Martin
This is very do-able. (I do this now.)
Here is a skeleton for how it can be done:
public class JobSubmitter implements Tool {
public static void main(String[] args) throws Exception {
ToolRunner.run(new Configuration(), new JobSubmitter(), args);
}
public JobSubmitter() {
<your code here>
}
public Configuration getConf() {
return appConf;
}
public void setConf(Configuration conf) {
this.appConf = conf;
}
public int run(String[] args) throws Exception {
Job job = new Job(appConf);
Configuration jobConf = job.getConfiguration();
jobConf.set(<your code here>);
<your code here>
job.submit();
}
}
re: "without packing any jar file at all":
If you use Tool/ToolRunner (as we are doing above), that lets your
Hadoop app automatically handle some key command line args. One them
that you will use here is the -libjars argument. If you use -libjars
and specify a list of jars that contain your code, then ToolRunner
will automatically take those jars and put them in the Distributed
Cache on each task node, where they will get added to the classpath of
every map/reduce task.
HTH,
DR