[ 
https://issues.apache.org/jira/browse/HBASE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440898#comment-16440898
 ] 

Sean Busbey commented on HBASE-20332:
-------------------------------------

{quote}
How's this work relate to what we have in guide since time immemorial? 
http://hbase.apache.org/book.html#hbase.mapreduce.classpath Especially your #1 
and #2 above.
{quote}

Leaving aside the {{hadoop jar}} vs {{yarn jar}} hadoop details, I think those 
instructions just tell you how to submit stuff using the non-shaded mr stuff. 
Since we have a goal of "push folks to the shaded bits" then probably I'll need 
to rewrite that section once I'm convinced the MR shaded jar works. The 
essentials still look the same, {{HADOOP_CLASSPATH}} and {{-libjars}} just need 
to point at the shaded mr jar instead.

(also the ref guide is still naming full qualified classnames instead of short 
name? ugh.)

{quote}
The workarounds are for the doubled mention of the shaded hbase mr jar?

Skimmed the patch.

Is this right?

199     <artifactId>hbase-server</artifactId>
200     <scope>provided</scope>

How is hbase-server provided at runtime if not in the hbase-shaded-mapreduce 
jar?
{quote}

The workaround of {{HADOOP_USER_CLASSPATH_FIRST}} was for my attempt to us 
{{exportsnapshot}} failing because my local YARN install has its own set of 
HBase jars that it is putting into my application's classpath.

But I think I have a deeper problem than that. As you mention, surely we are 
using _some_ of the hbase-server classes and having it at provided in 
hbase-mapreduce would mean none of it shows up in the shaded jar. I suspect my 
polluted YARN classpath is providing some of those classes. I'm going to do a 
quick verification and then if necessary run through my tests again.

{quote}
Say more why the new module hbase-shaded-with-hadoop-check-invariants?
{quote}

The way the "check invariants" modules work is they run against the set of 
dependencies listed for the module. To date I've tried not to have logic in the 
jar verification script about specific jars to check. Since we need to treat 
the contents of {{hbase-shaded-client}} different from 
{{hbase-shaded-mapreduce}}, I added the flag to the script for "can Hadoop be 
in here?" but I need a way to determine if that flag is passed to the 
invocation. If we want to be able to just do accounting of the jars to check 
via dependencies to the check-invariants pom(s), then we need two modules so 
that one can pass the flag and the other not.

An alternative would be to keep a single check invariants module, but to stop 
automatically enumerating the dependencies to pass to the jar checking 
invocation and instead build two executions of the jar checking script where we 
list each jar it should check. We could maybe do this in beanshell and just do 
a whitelist regex for "can have hadoop classes".

> shaded mapreduce module shouldn't include hadoop
> ------------------------------------------------
>
>                 Key: HBASE-20332
>                 URL: https://issues.apache.org/jira/browse/HBASE-20332
>             Project: HBase
>          Issue Type: Sub-task
>          Components: mapreduce, shading
>    Affects Versions: 2.0.0
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: HBASE-20332.0.patch
>
>
> AFAICT, we should just entirely skip including hadoop in our shaded mapreduce 
> module
> 1) Folks expect to run yarn / mr apps via {{hadoop jar}} / {{yarn jar}}
> 2) those commands include all the needed Hadoop jars in your classpath by 
> default (both client side and in the containers)
> 3) If you try to use "user classpath first" for your job as a workaround 
> (e.g. for some library your application needs that hadoop provides) then our 
> inclusion of *some but not all* hadoop classes then causes everything to fall 
> over because of mixing rewritten and non-rewritten hadoop classes
> 4) if you don't use "user classpath first" then all of our 
> non-relocated-but-still-shaded hadoop classes are ignored anyways so we're 
> just wasting space



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to