Sean Busbey commented on HBASE-20332:

removing hadoop from the shaded mapreduce module means removing the htrace v3 
version that was included as a transitive dependency of it (since we only need 
v4). Part of doing that properly required cleaning up how we handle htrace v3 
coming in as a transitive of hadoop in other places. A big part of that is 
removing exclusions so that we get it when Hadoop needs it for e.g. tests 
(rather than, for example, expressly listing it as a compile scope dependency 
of hbase-server that we don't actually need).

Since we're removing the exclusion in a bunch of places, we need to make sure 
folks don't inadvertently rely on it as an unlisted dependency.  We want to 
make sure that when folks reference HTrace they're referencing HTrace v4 
classes and not earlier classes. so for now we tell checkstyle to flag 
{{org.htrace}} and {{org.apache.htrace}} (I don't know of a way to flag that 
use of {{org.apache.htrace.core}} specifically is fine).

> shaded mapreduce module shouldn't include hadoop
> ------------------------------------------------
>                 Key: HBASE-20332
>                 URL: https://issues.apache.org/jira/browse/HBASE-20332
>             Project: HBase
>          Issue Type: Sub-task
>          Components: mapreduce, shading
>    Affects Versions: 2.0.0
>            Reporter: Sean Busbey
>            Assignee: Sean Busbey
>            Priority: Critical
>             Fix For: 3.0.0, 2.1.0
>         Attachments: HBASE-20332.0.patch, HBASE-20332.1.WIP.patch, 
> HBASE-20332.2.WIP.patch, HBASE-20332.3.patch, HBASE-20332.4.patch
> AFAICT, we should just entirely skip including hadoop in our shaded mapreduce 
> module
> 1) Folks expect to run yarn / mr apps via {{hadoop jar}} / {{yarn jar}}
> 2) those commands include all the needed Hadoop jars in your classpath by 
> default (both client side and in the containers)
> 3) If you try to use "user classpath first" for your job as a workaround 
> (e.g. for some library your application needs that hadoop provides) then our 
> inclusion of *some but not all* hadoop classes then causes everything to fall 
> over because of mixing rewritten and non-rewritten hadoop classes
> 4) if you don't use "user classpath first" then all of our 
> non-relocated-but-still-shaded hadoop classes are ignored anyways so we're 
> just wasting space

This message was sent by Atlassian JIRA

Reply via email to