Bryan Beaudreault created HBASE-26909:
-----------------------------------------
Summary: hbase-shaded-mapreduce and hbase-shaded-client expose
some of the same classes
Key: HBASE-26909
URL: https://issues.apache.org/jira/browse/HBASE-26909
Project: HBase
Issue Type: Improvement
Reporter: Bryan Beaudreault
We supply 2 primary artifacts for end-users to consume:
* hbase-shaded-client, which is for general use
* hbase-shaded-mapreduce, which is for use when you need to connect to hbase
via mapreduce. For example, TableInputFormat
The problem is that these artifacts expose tons of duplicate classes. One
example (among many) is org.apache.hadoop.hbase.Cell, which appears in both
jars.
This may not be a problem if your projects are always very isolated – either
doing mapreduce, or not. In that case you just depend in the one you need. Many
users might exist in much more complicated environments where dependencies tend
to bleed along more between projects. Here's an illustration:
Imagine a project FooService, which includes two modules FooServiceRestWeb (for
the rest http resources) and FooServiceData (which includes DAOs for accessing
data). FooServiceRestWeb depends on FooServiceData to access hbase. In this
case, FooServiceData should depend on hbase-shaded-client.
Now imagine another project FooPipeline, which has modules FooPipelineHadoop
(with M/R jobs for processing data) and FooPipelineData (which has some DAOs
for accessing data). In this case, FooPipelineData might depend on
hbase-shaded-mapreduce since the context is intended for M/R.
The problem arises when suddenly we want to include some data from FooService
into our pipeline. The most straightforward way to achieve this is by depending
on FooServiceData, which has all of he DAOs for that data but also depends on
hbase-shaded-client. At this point you have a problem, because
FooPipelineHadoop now depends on both hbase-shaded-mapreduce and
hbase-shaded-client.
(Note, this obviously skirts around potential microservice solutions like only
accessing FooService's data through the API... it's just for illustration, and
it does come up.)
>From a plain java perspective, having these 2 jars on the classpath is
>somewhat wasteful but not a huge issue since the implementations are all the
>same.
>From a maven perspective, it's problematic because the maven dependency plugin
>will complain about the conflicting classes.
One potential fix is to add exclusions to the FooServiceData dependency, to
avoid pulling in hbase-shaded-client. This works on a one-off basis but is much
more painful in a large and complicated environment where this may come up
hundreds of times.
A better fix in my opinion is to make hbase-shaded-mapreduce depend on
hbase-shaded-client and then only expose the classes that aren't already
exposed by the shaded client.
[~busbey] also mentioned a BOM being a potential solution, but I don't have
experience with that.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)