[ 
https://issues.apache.org/jira/browse/HBASE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440026#comment-16440026
 ] 

Sean Busbey commented on HBASE-20332:
-------------------------------------

Okay I've gone through most of the built in MR jobs:

{code}
An example program must be given as the first argument.
Valid program names are:
  CellCounter: Count cells in HBase table.
  WALPlayer: Replay WAL files.
  completebulkload: Complete a bulk data load.
  copytable: Export a table from local cluster to peer cluster.
  export: Write table data to HDFS.
  exportsnapshot: Export the specific snapshot to a given FileSystem.
  import: Import data written by Export.
  importtsv: Import data in TSV format.
  rowcounter: Count rows in HBase table.
  verifyrep: Compare data from tables in two different clusters. It doesn't 
work for incrementColumnValues'd cells since timestamp is changed after 
appending to WAL.
{code}

These all worked fine (see note at end though):

* CellCounter
* copytable (with and without bulkload)
* export
* import
* importtsv (with and without bulkload)
* completebulkload
* rowcounter

I don't have stuff set up ATM to do {{WALPlayer}} or {{verifyrep}}.

When running {{exportsnapshot}} I got the following failure, which I haven't 
dug into yet:

{code}
xception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:63)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
Caused by: java.lang.NoSuchMethodError: 
org.apache.hadoop.hbase.snapshot.ExportSnapshot.addRequiredOption(Lorg/apache/hbase/thirdparty/org/apache/commons/cli/Option;)V
        at 
org.apache.hadoop.hbase.snapshot.ExportSnapshot.addOptions(ExportSnapshot.java:1094)
        at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.run(AbstractHBaseTool.java:132)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at 
org.apache.hadoop.hbase.util.AbstractHBaseTool.doStaticMain(AbstractHBaseTool.java:270)
        at 
org.apache.hadoop.hbase.snapshot.ExportSnapshot.main(ExportSnapshot.java:1109)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
        ... 11 more
{code}

Note at the end:

when invoking via yarn I expect things to look like

{code}
HADOOP_CLASSPATH=/etc/hbase/conf yarn jar 
hbase-shaded-mapreduce-3.0.0-SNAPSHOT.jar <command> <options>
{code}

But this fails because our dependency adder can't find a class that we can 
manually see is present:

{code}
Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.hbase.mapreduce.Driver.main(Driver.java:63)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:308)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:222)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.hbase.mapreduce.TableInputFormat not found
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2479)
        at 
org.apache.hadoop.mapreduce.task.JobContextImpl.getInputFormatClass(JobContextImpl.java:175)
        at 
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.addDependencyJars(TableMapReduceUtil.java:864)
        at 
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:213)
        at 
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:169)
        at 
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:292)
        at 
org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableMapperJob(TableMapReduceUtil.java:93)
        at 
org.apache.hadoop.hbase.mapreduce.CopyTable.createSubmittableJob(CopyTable.java:167)
        at org.apache.hadoop.hbase.mapreduce.CopyTable.run(CopyTable.java:362)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.hbase.mapreduce.CopyTable.main(CopyTable.java:356)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
        at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
        at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:152)
        ... 11 more
Caused by: java.lang.ClassNotFoundException: Class 
org.apache.hadoop.hbase.mapreduce.TableInputFormat not found
        at 
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2383)
        at 
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2477)
        ... 28 more
{code}

So instead I end up having to double up via HADOOP_CLASSPATH:

{code}
HADOOP_CLASSPATH=hbase-shaded-mapreduce-3.0.0-SNAPSHOT.jar:/etc/hbase/conf/ 
yarn jar hbase-shaded-mapreduce-3.0.0-SNAPSHOT.jar <command> <options>
{code}

> shaded mapreduce module shouldn't include hadoop
> ------------------------------------------------
>
>                 Key: HBASE-20332
>                 URL: https://issues.apache.org/jira/browse/HBASE-20332
>             Project: HBase
>          Issue Type: Sub-task
>          Components: mapreduce, shading
>    Affects Versions: 2.0.0
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: HBASE-20332.0.patch
>
>
> AFAICT, we should just entirely skip including hadoop in our shaded mapreduce 
> module
> 1) Folks expect to run yarn / mr apps via {{hadoop jar}} / {{yarn jar}}
> 2) those commands include all the needed Hadoop jars in your classpath by 
> default (both client side and in the containers)
> 3) If you try to use "user classpath first" for your job as a workaround 
> (e.g. for some library your application needs that hadoop provides) then our 
> inclusion of *some but not all* hadoop classes then causes everything to fall 
> over because of mixing rewritten and non-rewritten hadoop classes
> 4) if you don't use "user classpath first" then all of our 
> non-relocated-but-still-shaded hadoop classes are ignored anyways so we're 
> just wasting space



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to