[ https://issues.apache.org/jira/browse/HBASE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440026#comment-16440026 ]
Sean Busbey commented on HBASE-20332: ------------------------------------- Okay I've gone through most of the built in MR jobs: {code} An example program must be given as the first argument. Valid program names are: CellCounter: Count cells in HBase table. WALPlayer: Replay WAL files. completebulkload: Complete a bulk data load. copytable: Export a table from local cluster to peer cluster. export: Write table data to HDFS. exportsnapshot: Export the specific snapshot to a given FileSystem. import: Import data written by Export. importtsv: Import data in TSV format. rowcounter: Count rows in HBase table. verifyrep: Compare data from tables in two different clusters. It doesn't work for incrementColumnValues'd cells since timestamp is changed after appending to WAL. {code} These all worked fine (see note at end though): * CellCounter * copytable (with and without bulkload) * export * import * importtsv (with and without bulkload) * completebulkload * rowcounter I don't have stuff set up ATM to do {{WALPlayer}} or {{verifyrep}}. When running {{exportsnapshot}} I got the following failure, which I haven't dug into yet: {code} xception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:308) at org.apache.hadoop.util.RunJar.main(RunJar.java:222) Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hbase.snapshot.ExportSnapshot.addRequiredOption(Lorg/apache/hbase/thirdparty/org/apache/commons/cli/Option;)V at org.apache.hadoop.hbase.snapshot.ExportSnapshot.addOptions(ExportSnapshot.java:1094) at org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:132) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hbase.util.AbstractHBaseTool.doStaticMain(AbstractHBaseTool.java:270) at org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:1109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152) ... 11 more {code} Note at the end: when invoking via yarn I expect things to look like {code} HADOOP_CLASSPATH=/etc/hbase/conf yarn jar hbase-shaded-mapreduce-3.0.0-SNAPSHOT.jar <command> <options> {code} But this fails because our dependency adder can't find a class that we can manually see is present: {code} Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:63) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:308) at org.apache.hadoop.util.RunJar.main(RunJar.java:222) Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.mapreduce.TableInputFormat not found at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2479) at org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:175) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.addDependencyJars(TableMapReduceUtil.java:864) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:213) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:169) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:292) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:93) at org.apache.hadoop.hbase.mapreduce.CopyTable.createSubmittableJob(CopyTable.java:167) at org.apache.hadoop.hbase.mapreduce.CopyTable.run(CopyTable.java:362) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.hbase.mapreduce.CopyTable.main(CopyTable.java:356) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71) at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152) ... 11 more Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hbase.mapreduce.TableInputFormat not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2383) at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2477) ... 28 more {code} So instead I end up having to double up via HADOOP_CLASSPATH: {code} HADOOP_CLASSPATH=hbase-shaded-mapreduce-3.0.0-SNAPSHOT.jar:/etc/hbase/conf/ yarn jar hbase-shaded-mapreduce-3.0.0-SNAPSHOT.jar <command> <options> {code} > shaded mapreduce module shouldn't include hadoop > ------------------------------------------------ > > Key: HBASE-20332 > URL: https://issues.apache.org/jira/browse/HBASE-20332 > Project: HBase > Issue Type: Sub-task > Components: mapreduce, shading > Affects Versions: 2.0.0 > Reporter: Sean Busbey > Assignee: Sean Busbey > Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-20332.0.patch > > > AFAICT, we should just entirely skip including hadoop in our shaded mapreduce > module > 1) Folks expect to run yarn / mr apps via {{hadoop jar}} / {{yarn jar}} > 2) those commands include all the needed Hadoop jars in your classpath by > default (both client side and in the containers) > 3) If you try to use "user classpath first" for your job as a workaround > (e.g. for some library your application needs that hadoop provides) then our > inclusion of *some but not all* hadoop classes then causes everything to fall > over because of mixing rewritten and non-rewritten hadoop classes > 4) if you don't use "user classpath first" then all of our > non-relocated-but-still-shaded hadoop classes are ignored anyways so we're > just wasting space -- This message was sent by Atlassian JIRA (v7.6.3#76005)