Hi, I always find that using the -libjars command line option is the easiest way to push jars to the cluster.
Also, you may want to checkout Apache Sqoop: http://incubator.apache.org/projects/sqoop.html Brock On Fri, Oct 28, 2011 at 12:17 PM, Jamal x <[email protected]> wrote: > Hi, > > I wrote a small test program to perform a simple database extraction of > information from a simple table on a remote cluster. However, it fails to > execute successfully when I run from eclipse it with the following > exception: > > 12:36:08,993 WARN main mapred.JobClient:659 - Use GenericOptionsParser for > parsing the arguments. Applications should implement Tool for the same. > 12:36:09,567 WARN main mapred.JobClient:776 - No job jar file set. User > classes may not be found. See JobConf(Class) or JobConf#setJar(String). > java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > > at org.apache.hadoop.mapred.JobConf.getInputFormat(JobConf.java:575) > at > org.apache.hadoop.mapred.MapTask$TrackedRecordReader.<init>(MapTask.java:197) > > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372) > at org.apache.hadoop.mapred.Child$4.run(Child.java:255) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) > > at org.apache.hadoop.mapred.Child.main(Child.java:249) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 11 more > Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: > com.mysql.jdbc.Driver > at > org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:271) > > ... 16 more > Caused by: java.lang.ClassNotFoundException: com.mysql.jdbc.Driver > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) > at java.lang.ClassLoader.loadClass(ClassLoader.java:247) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:169) > at > org.apache.hadoop.mapred.lib.db.DBConfiguration.getConnection(DBConfiguration.java:123) > > at > org.apache.hadoop.mapred.lib.db.DBInputFormat.configure(DBInputFormat.java:266) > > ... 16 more > > > I do have the mysql-connector jar under the $HADOOP_HOME/lib folder on all > servers in the cluster, and even tried using the > DistributedCache.addArchiveToClassPath method, with no success. Can someone > please help me figure out what is going on here? > > Here is my simple main which performs the remote submission of the job: > public int run(String[] arg0) throws Exception { > > System.out.println("Setting up job configuration...."); > Configuration conf = new Configuration(); > conf.set("mapred.job.tracker", "jobtracker.hostname:8021"); > conf.set("fs.default.name", "hdfs://namenode.hostname:9000"); > conf.set("keep.failed.task.files", "true"); > conf.set("mapred.child.java.opts", "-Xmx1024m"); > > FileSystem fs = FileSystem.get(conf); > fs.delete(new Path("/myfolder/dump_output/"), true); > fs.mkdirs(new Path("/myfolder/libs/")); > > fs.copyFromLocalFile( > new Path( > > "C:/Users/me/.m2/repository/org/mylib/0.1-SNAPSHOT/myproject-0.1-SNAPSHOT-hadoop.jar"), > > new > Path("/myfolder/libs/myproject-0.1-SNAPSHOT-hadoop.jar")); > > > fs.copyFromLocalFile( new Path( > > "C:/Users/me/.m2/repository/mysql/mysql-connector-java/5.1.17/mysql-connector-java-5.1.17.jar" > > ), new Path("/myfolder/libs/mysql-connector-java-5.1.17.jar")); > > DistributedCache.addArchiveToClassPath(new Path( > "/myfolder/libs/myproject-0.1-SNAPSHOT-hadoop.jar"), conf, > fs); > > DistributedCache.addArchiveToClassPath(new Path( > "/myfolder/libs/mysql-connector-java-5.1.17.jar"), conf, > fs); > > JobConf job = new JobConf(conf); > > job.setJobName("Exporting Job"); > job.setJarByClass(MyMapper.class); > job.setMapperClass(MyMapper.class); > Class claz = Class.forName("com.mysql.jdbc.Driver"); > if (claz == null) { > throw new RuntimeException("wow..."); > } > > Configuration.dumpConfiguration(conf, new PrintWriter(System.out)); > > DBConfiguration > .configureDB( > job, > "com.mysql.jdbc.Driver", > > "jdbc:mysql://mydbserver:3306/test?autoReconnect=true", > "user", "password"); > > String[] fields = { "employee_id", "name" }; > DBInputFormat.setInput(job, MyRecord.class, "employees", null, > "employee_id", fields); > > FileOutputFormat.setOutputPath(job, new Path( > "/myfolder/dump_output/")); > > System.out.println("Submitting job...."); > > JobClient.runJob(job); > > System.out.println("job info: " + job.getNumMapTasks()); > > return 0; > } > > public static void main(String[] args) throws Exception { > int exitCode = ToolRunner.run(new SimpleDriver(), args); > System.out.println("Completed."); > System.exit(exitCode); > } > > > I'm using the hadoop-core version 0.20.205.0 maven dependency to build and > run my program via eclipse. The myproject-0.1-SNAPSHOT-hadoop.jar jar has my > classes, and it's dependencies included under the /lib folder. > > Any help would be greatly appreciated. > > Thanks >
