FWIW, the existing shaded stuff in both HBase and Hadoop should already take this behavior into account.
On Mon, Apr 24, 2017 at 11:40 AM Nick Dimiduk <[email protected]> wrote: > FYI, MNG-5899 makes shaded builds fragile, effectively limiting > multi-module shaded projects to maven 3.2.x. Apparently the Apache Storm > folks tripped over this earlier, and as I recall, Apache Flink used to > require building with 3.2.x for the same reason. > > https://issues.apache.org/jira/browse/MNG-5899 > > On Tue, Apr 18, 2017 at 9:20 PM, Nick Dimiduk <[email protected]> wrote: > > > On Wed, Apr 12, 2017 at 2:30 PM, Stack <[email protected]> wrote: > > > >> > >> If the above quote is true, then I think what we want is a set of > >> > shaded > >> > > >> Hadoop client libs that we can depend on so as to not get all the > >> > > >> transitive deps. Hadoop doesn't provide it, but we could do so > >> > ourselves > >> > > >> with (yet another) module in our project. Assuming, that is, the > >> > > upstream > >> > > >> client interfaces are well defined and don't leak stuff we care > >> about. > >> > > >> > >> > >> We should do this too (I think you've identified the big 'if' w/ the > above > >> identified assumption). As you say later, "... it's time we firm up the > >> boundaries between us and Hadoop.". There is some precedent with > >> hadoop-compat-* modules. Hadoop would be relocated? > >> > > > > Ideally we'd relocate any parts of Hadoop that are not part of our public > > contract. Not sure if there's an intersection between "ideal" and > > "practical" though. > > > > Spitballing, IIUC, I think this would be a big job (once per version and > >> the vagaries of hadoop/spark) with no guarantee of success on other end > >> because of assumption you call out. Do I have this right? > >> > > > > Yeah you have my meaning. My argument is not whether we should shade but > > rather how we make it a maintainable deployment tool for our team of > > volunteers. Hence interest in compatibility verification tools like we do > > with our api compatibility tools. > > > > > Isolating our clients from our deps is best served by our shaded > modules. > >> > What do you think about turning things on their head: for 2.0 the > >> > hbase-client jar is the shaded artifact by default, not the other way > >> > around? We have cleanup to get our deps out of our public interfaces > in > >> > order to make this work. > >> > > >> > > >> We should do this at least going forward. hbase2 is the opportunity. > >> Testing and doc is all that is needed? I added it to our hbase2 > >> description > >> doc as a deliverable (though not a blocker). > >> > > > > I've not tried to consume these efforts. A reasonable test-case to see if > > these are ready for prime-time would be to try rebuilding one of the more > > complex downstream projects (i.e, Phoenix, Trafodion, Splice) using the > > shaded jars and see how bad the diff is. > > > > > This proposal of an external shaded dependencies module sounds like an > >> > attempt to solve both concerns at once. It would isolate ourselves > from > >> > Hadoop's deps, and it would isolate our clients from our deps. > However, > >> it > >> > doesn't isolate our clients from Hadoop's deps, so our users don't > >> really > >> > gain anything from it. I also argue that it creates an unreasonable > >> release > >> > engineering burden on our project. I'm also not clear on the > >> implications > >> > to downstreamers who extend us with coprocessors. > >> > > >> > >> > >> Other than a missing 'quick-fix' descriptor, you call what is proposed > >> well > >> ....except where you think the prebuild will be burdensome. Here I think > >> otherwise as I think releases will be rare, there is nought 'new' in a > >> release but packaged 3rd-party libs, and verification/vote by PMCers > >> should > >> be a simple affair. > >> > > > > Maybe it's not such a burden? If the 2.0 and 3.0 RM's are brave and true, > > it's worth a go. > > > > Do you agree that the fixing-what-we-leak-of-hadoop-to-downstreamers is > >> distinct from the narrower task proposed here where we are trying to > >> unhitch ourselves of the netty/guava hadoop uses? (Currently we break > >> against hadoop3 because of netty incompat., HADOOP-13866, which we might > >> be > >> able to solve w/ exclusions.....but....). > >> > >> The two tasks can be run in parallel? > >> > > > > Indeed, they seem distinct but quite related. > > > > For CPs, they should bring their own bedding and towels and not be trying > >> to use ours. On the plus-side, we could upgrade core 3rd-party libs and > >> the > >> CP would keep working. > >> > > > > All of this sounds like an ideal state. > > >
