[ https://issues.apache.org/jira/browse/HBASE-20332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16440898#comment-16440898 ]
Sean Busbey commented on HBASE-20332: ------------------------------------- {quote} How's this work relate to what we have in guide since time immemorial? http://hbase.apache.org/book.html#hbase.mapreduce.classpath Especially your #1 and #2 above. {quote} Leaving aside the {{hadoop jar}} vs {{yarn jar}} hadoop details, I think those instructions just tell you how to submit stuff using the non-shaded mr stuff. Since we have a goal of "push folks to the shaded bits" then probably I'll need to rewrite that section once I'm convinced the MR shaded jar works. The essentials still look the same, {{HADOOP_CLASSPATH}} and {{-libjars}} just need to point at the shaded mr jar instead. (also the ref guide is still naming full qualified classnames instead of short name? ugh.) {quote} The workarounds are for the doubled mention of the shaded hbase mr jar? Skimmed the patch. Is this right? 199 <artifactId>hbase-server</artifactId> 200 <scope>provided</scope> How is hbase-server provided at runtime if not in the hbase-shaded-mapreduce jar? {quote} The workaround of {{HADOOP_USER_CLASSPATH_FIRST}} was for my attempt to us {{exportsnapshot}} failing because my local YARN install has its own set of HBase jars that it is putting into my application's classpath. But I think I have a deeper problem than that. As you mention, surely we are using _some_ of the hbase-server classes and having it at provided in hbase-mapreduce would mean none of it shows up in the shaded jar. I suspect my polluted YARN classpath is providing some of those classes. I'm going to do a quick verification and then if necessary run through my tests again. {quote} Say more why the new module hbase-shaded-with-hadoop-check-invariants? {quote} The way the "check invariants" modules work is they run against the set of dependencies listed for the module. To date I've tried not to have logic in the jar verification script about specific jars to check. Since we need to treat the contents of {{hbase-shaded-client}} different from {{hbase-shaded-mapreduce}}, I added the flag to the script for "can Hadoop be in here?" but I need a way to determine if that flag is passed to the invocation. If we want to be able to just do accounting of the jars to check via dependencies to the check-invariants pom(s), then we need two modules so that one can pass the flag and the other not. An alternative would be to keep a single check invariants module, but to stop automatically enumerating the dependencies to pass to the jar checking invocation and instead build two executions of the jar checking script where we list each jar it should check. We could maybe do this in beanshell and just do a whitelist regex for "can have hadoop classes". > shaded mapreduce module shouldn't include hadoop > ------------------------------------------------ > > Key: HBASE-20332 > URL: https://issues.apache.org/jira/browse/HBASE-20332 > Project: HBase > Issue Type: Sub-task > Components: mapreduce, shading > Affects Versions: 2.0.0 > Reporter: Sean Busbey > Assignee: Sean Busbey > Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-20332.0.patch > > > AFAICT, we should just entirely skip including hadoop in our shaded mapreduce > module > 1) Folks expect to run yarn / mr apps via {{hadoop jar}} / {{yarn jar}} > 2) those commands include all the needed Hadoop jars in your classpath by > default (both client side and in the containers) > 3) If you try to use "user classpath first" for your job as a workaround > (e.g. for some library your application needs that hadoop provides) then our > inclusion of *some but not all* hadoop classes then causes everything to fall > over because of mixing rewritten and non-rewritten hadoop classes > 4) if you don't use "user classpath first" then all of our > non-relocated-but-still-shaded hadoop classes are ignored anyways so we're > just wasting space -- This message was sent by Atlassian JIRA (v7.6.3#76005)