mmiklavc edited a comment on issue #1358: METRON-2038 Enrichment Loader Fails 
When Run as MR Job
URL: https://github.com/apache/metron/pull/1358#issuecomment-474527445
 
 
   @nickwallen - Thanks for tackling this. I happened to also see the rogue 
HBaseConfiguration along with a number of other deps that come from 
hbase-common getting pulled into the metron-data-management uber jar. 
Bootstrapping off of your findings, I thought it worthwhile to dig into this a 
bit more. It seemed like there might be a bug in the Maven shade plugin which 
would potentially cause us more trouble down the line and I wanted to see what 
we were in for. For starters, here's the class I'm using as a tracer bullet.
   
   ```
   jar tvf 
/Users/mmiklavcic/.m2/repository/org/apache/metron/metron-data-management/0.7.1/metron-data-management-0.7.1.jar|grep
 HBaseConfiguration
     7195 Mon Mar 18 12:02:08 MDT 2019 
org/apache/hadoop/hbase/HBaseConfiguration.class
   ```
   
   I went through a similar series of inclusions and exclusions in the mdm 
pom.xml only to find no change in the rogue deps appearing. Also, regardless of 
their scope tag, I continued to see that classes in the uber jar. I got to the 
point where I had excluded all transitive dependencies for hbase-common and 
*still* was seeing the classes. Either there was a bug in the shade plugin or 
we were missing something.
   
   The next step I took was to look over the dependency:tree from the CLI - all 
deps checked out accordingly. Default scope included by the Shade plugin? - 
yep, scope "provided" is not included by default. I also examined the build 
output from the Shade plugin - very clearly, there is no hbase-common 
dependency included in the shading. This is starting to feel like The 
Adventures of Buckaroo Banzai Across the 8th Dimension or The Adventures of 
Baron Munchausen. Either way, it's a trip.
   
   Ok, let's go more brute force on this and explicitly copy dependencies. I 
looked at the compile scope dependencies first, but that didn't look quite 
right. I landed on runtime scope most closely representing what the Shade 
plugin bundles. I then iterated over all the jars looking for the config class.
   
   ```
   mvn dependency:copy-dependencies -DoutputDirectory=/tmp/mikeout 
-DincludeScope=runtime
   for jar in /tmp/mikeout/*.jar; do echo $jar; jar tvf $jar|grep 
HBaseConfiguration; done
   ...
   /tmp/mikeout/metron-profiler-client-0.7.1.jar
     7161 Fri Mar 15 15:45:58 MDT 2019 
org/apache/hadoop/hbase/HBaseConfiguration.class
   ...
   ```
   
   Ok, so there it is. But why??? I looked over our dependency graph and see 
that metron-data-management depends on metron-enrichment depends on 
metron-profiler-client. Ok, there's the link, but what about hbase-common? 
metron-profiler-client has a dependency on hbase-common, scope=compile. Getting 
closer. The client's pom _also_ has a reference to the maven shade plugin and 
an assembly.xml. I explored dependencies on metron-profiler-client from a shell 
script and deployment perspective and did not find any circumstances where we 
would need to deploy this module stand-alone. I bumped the hbase-common 
dependency to provided (debatable) and removed all references to 
shading/relocating and a tarball assembly. I rebuilt the project and no longer 
see the HBaseConfiguration, or any other hbase-common classes for that matter, 
in the metron-data-management jar.
   
   I'm currently undertaking testing from 
https://github.com/apache/metron/pull/432#issuecomment-276733075 as indicated 
above, but I think this should solve the root problem without requiring us to 
modify our license file or eliminate the core Hadoop libs from our shell script 
classpath. I've created a PR against your PR here - 
https://github.com/nickwallen/metron/pull/15

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to