Thanks for the response. I need to submit this job programatically, instead of using the command line. Shouldn't the distributedCache class method handle the classpath setup for the job? If not, is there some other setup missing from my driver class?
I also, looked into sqoop, but wanted to get this working for a particular case which I think isn't a good fit fot it,but I may be wrong. Plus, wanted to use this usecase for getting more experience with creating and running jobs remotely. Thanks On Oct 28, 2011 1:38 PM, "Brock Noland" <[email protected]> wrote: > Hi, > > I always find that using the -libjars command line option is the > easiest way to push jars to the cluster. > > Also, you may want to checkout Apache Sqoop: > http://incubator.apache.org/projects/sqoop.html > > Brock > > On Fri, Oct 28, 2011 at 12:17 PM, Jamal x <[email protected]> wrote: > > Hi, > > > > I wrote a small test program to perform a simple database extraction of > > information from a simple table on a remote cluster. However, it fails > to > > execute successfully when I run from eclipse it with the following > > exception: > > > > 12:36:08,993 WARN main mapred.JobClient:659 - Use GenericOptionsParser > for > > parsing the arguments. Applications should implement Tool for the same. > > 12:36:09,567 WARN main mapred.JobClient:776 - No job jar file set. User > > classes may not be found. See JobConf(Class) or JobConf#setJar(String). > > java.lang.RuntimeException: Error in configuring object > > at > > > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > > at > > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > > at > > > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > > > > at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:575) > > at > > > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:197) > > > > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418) > > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) > > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:396) > > at > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > > > > at org.apache.hadoop.mapred.Child.main(Child.java:249) > > Caused by: java.lang.reflect.InvocationTargetException > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > > ... 11 more > > Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: > > com.mysql.jdbc.Driver > > at > > > org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:271) > > > > ... 16 more > > Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver > > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > > at java.security.AccessController.doPrivileged(Native Method) > > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > > at java.lang.Class.forName0(Native Method) > > at java.lang.Class.forName(Class.java:169) > > at > > > org.apache.hadoop.mapred.lib.db.DBConfiguration.getConnection(DBConfiguration.java:123) > > > > at > > > org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:266) > > > > ... 16 more > > > > > > I do have the mysql-connector jar under the $HADOOP_HOME/lib folder on > all > > servers in the cluster, and even tried using the > > DistributedCache.addArchiveToClassPath method, with no success. Can > someone > > please help me figure out what is going on here? > > > > Here is my simple main which performs the remote submission of the job: > > public int run(String[] arg0) throws Exception { > > > > System.out.println("Setting up job configuration...."); > > Configuration conf = new Configuration(); > > conf.set("mapred.job.tracker", "jobtracker.hostname:8021"); > > conf.set("fs.default.name", "hdfs://namenode.hostname:9000"); > > conf.set("keep.failed.task.files", "true"); > > conf.set("mapred.child.java.opts", "-Xmx1024m"); > > > > FileSystem fs = FileSystem.get(conf); > > fs.delete(new Path("/myfolder/dump_output/"), true); > > fs.mkdirs(new Path("/myfolder/libs/")); > > > > fs.copyFromLocalFile( > > new Path( > > > > > "C:/Users/me/.m2/repository/org/mylib/0.1-SNAPSHOT/myproject-0.1-SNAPSHOT-hadoop.jar"), > > > > new > > Path("/myfolder/libs/myproject-0.1-SNAPSHOT-hadoop.jar")); > > > > > > fs.copyFromLocalFile( new Path( > > > > > "C:/Users/me/.m2/repository/mysql/mysql-connector-java/5.1.17/mysql-connector-java-5.1.17.jar" > > > > ), new Path("/myfolder/libs/mysql-connector-java-5.1.17.jar")); > > > > DistributedCache.addArchiveToClassPath(new Path( > > "/myfolder/libs/myproject-0.1-SNAPSHOT-hadoop.jar"), conf, > > fs); > > > > DistributedCache.addArchiveToClassPath(new Path( > > "/myfolder/libs/mysql-connector-java-5.1.17.jar"), conf, > > fs); > > > > JobConf job = new JobConf(conf); > > > > job.setJobName("Exporting Job"); > > job.setJarByClass(MyMapper.class); > > job.setMapperClass(MyMapper.class); > > Class claz = Class.forName("com.mysql.jdbc.Driver"); > > if (claz == null) { > > throw new RuntimeException("wow..."); > > } > > > > Configuration.dumpConfiguration(conf, new > PrintWriter(System.out)); > > > > DBConfiguration > > .configureDB( > > job, > > "com.mysql.jdbc.Driver", > > > > "jdbc:mysql://mydbserver:3306/test?autoReconnect=true", > > "user", "password"); > > > > String[] fields = { "employee_id", "name" }; > > DBInputFormat.setInput(job, MyRecord.class, "employees", null, > > "employee_id", fields); > > > > FileOutputFormat.setOutputPath(job, new Path( > > "/myfolder/dump_output/")); > > > > System.out.println("Submitting job...."); > > > > JobClient.runJob(job); > > > > System.out.println("job info: " + job.getNumMapTasks()); > > > > return 0; > > } > > > > public static void main(String[] args) throws Exception { > > int exitCode = ToolRunner.run(new SimpleDriver(), args); > > System.out.println("Completed."); > > System.exit(exitCode); > > } > > > > > > I'm using the hadoop-core version 0.20.205.0 maven dependency to build > and > > run my program via eclipse. The myproject-0.1-SNAPSHOT-hadoop.jar jar has > my > > classes, and it's dependencies included under the /lib folder. > > > > Any help would be greatly appreciated. > > > > Thanks > > >
