[
https://issues.apache.org/jira/browse/HBASE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440898#comment-16440898
]
Sean Busbey commented on HBASE-20332:
-------------------------------------
{quote}
How's this work relate to what we have in guide since time immemorial?
http://hbase.apache.org/book.html#hbase.mapreduce.classpath Especially your #1
and #2 above.
{quote}
Leaving aside the {{hadoop jar}} vs {{yarn jar}} hadoop details, I think those
instructions just tell you how to submit stuff using the non-shaded mr stuff.
Since we have a goal of "push folks to the shaded bits" then probably I'll need
to rewrite that section once I'm convinced the MR shaded jar works. The
essentials still look the same, {{HADOOP_CLASSPATH}} and {{-libjars}} just need
to point at the shaded mr jar instead.
(also the ref guide is still naming full qualified classnames instead of short
name? ugh.)
{quote}
The workarounds are for the doubled mention of the shaded hbase mr jar?
Skimmed the patch.
Is this right?
199 <artifactId>hbase-server</artifactId>
200 <scope>provided</scope>
How is hbase-server provided at runtime if not in the hbase-shaded-mapreduce
jar?
{quote}
The workaround of {{HADOOP_USER_CLASSPATH_FIRST}} was for my attempt to us
{{exportsnapshot}} failing because my local YARN install has its own set of
HBase jars that it is putting into my application's classpath.
But I think I have a deeper problem than that. As you mention, surely we are
using _some_ of the hbase-server classes and having it at provided in
hbase-mapreduce would mean none of it shows up in the shaded jar. I suspect my
polluted YARN classpath is providing some of those classes. I'm going to do a
quick verification and then if necessary run through my tests again.
{quote}
Say more why the new module hbase-shaded-with-hadoop-check-invariants?
{quote}
The way the "check invariants" modules work is they run against the set of
dependencies listed for the module. To date I've tried not to have logic in the
jar verification script about specific jars to check. Since we need to treat
the contents of {{hbase-shaded-client}} different from
{{hbase-shaded-mapreduce}}, I added the flag to the script for "can Hadoop be
in here?" but I need a way to determine if that flag is passed to the
invocation. If we want to be able to just do accounting of the jars to check
via dependencies to the check-invariants pom(s), then we need two modules so
that one can pass the flag and the other not.
An alternative would be to keep a single check invariants module, but to stop
automatically enumerating the dependencies to pass to the jar checking
invocation and instead build two executions of the jar checking script where we
list each jar it should check. We could maybe do this in beanshell and just do
a whitelist regex for "can have hadoop classes".
> shaded mapreduce module shouldn't include hadoop
> ------------------------------------------------
>
> Key: HBASE-20332
> URL: https://issues.apache.org/jira/browse/HBASE-20332
> Project: HBase
> Issue Type: Sub-task
> Components: mapreduce, shading
> Affects Versions: 2.0.0
> Reporter: Sean Busbey
> Assignee: Sean Busbey
> Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-20332.0.patch
>
>
> AFAICT, we should just entirely skip including hadoop in our shaded mapreduce
> module
> 1) Folks expect to run yarn / mr apps via {{hadoop jar}} / {{yarn jar}}
> 2) those commands include all the needed Hadoop jars in your classpath by
> default (both client side and in the containers)
> 3) If you try to use "user classpath first" for your job as a workaround
> (e.g. for some library your application needs that hadoop provides) then our
> inclusion of *some but not all* hadoop classes then causes everything to fall
> over because of mixing rewritten and non-rewritten hadoop classes
> 4) if you don't use "user classpath first" then all of our
> non-relocated-but-still-shaded hadoop classes are ignored anyways so we're
> just wasting space
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)