mmiklavc commented on issue #1358: METRON-2038 Enrichment Loader Fails When Run as MR Job URL: https://github.com/apache/metron/pull/1358#issuecomment-474527445 @nickwallen - Thanks for tackling this. I happened to also see the rogue HBaseConfiguration along with a number of other deps that come from hbase-common getting pulled into the metron-data-management uber jar. Bootstrapping off of your findings, I thought it worthwhile to dig into this a bit more. It seemed like there might be a bug in the Maven shade plugin which would potentially cause us more trouble down the line and I wanted to see what we were in for. For starters, here's the class I'm using as a tracer bullet. ``` jar tvf /Users/mmiklavcic/.m2/repository/org/apache/metron/metron-data-management/0.7.1/metron-data-management-0.7.1.jar|grep HBaseConfiguration 7195 Mon Mar 18 12:02:08 MDT 2019 org/apache/hadoop/hbase/HBaseConfiguration.class ``` I went through a similar series of inclusions and exclusions in the mdm pom.xml only to find no change in the rogue deps appearing. Also, regardless of their scope tag, I continued to see that classes in the uber jar. I got to the point where I had excluded all transitive dependencies for hbase-common and *still* was seeing the classes. Either there was a bug in the shade plugin or we were missing something. The next step I took was to look over the dependency:tree from the CLI - all deps checked out accordingly. Default scope included by the Shade plugin? - yep, scope "provided" is not included by default. I also examined the build output from the Shade plugin - very clearly, there is no hbase-common dependency included in the shading. This is starting to feel like The Adventures of Buckaroo Banzai Across the 8th Dimension or The Adventures of Baron Munchausen. Either way, it's a trip. Ok, let's go more brute force on this and explicitly copy dependencies. I looked at the compile scope dependencies first, but that didn't look quite right. I landed on runtime scope most closely representing what the Shade plugin bundles. I then iterated over all the jars looking for the config class. ``` mvn dependency:copy-dependencies -DoutputDirectory=/tmp/mikeout -DincludeScope=runtime for jar in /tmp/mikeout/*.jar; do echo $jar; jar tvf $jar|grep HBaseConfiguration; done ... /tmp/mikeout/metron-profiler-client-0.7.1.jar 7161 Fri Mar 15 15:45:58 MDT 2019 org/apache/hadoop/hbase/HBaseConfiguration.class ... ``` Ok, so there it is. But why??? I looked over our dependency graph and see that metron-data-management depends on metron-enrichment depends on metron-profiler-client. Ok, there's the link, but what about hbase-common? metron-profiler-client has a dependency on hbase-common, scope=compile. Getting closer. The client's pom _also_ has a reference to the maven shade plugin and an assembly.xml. I explored dependencies on metron-profiler-client from a shell script and deployment perspective and did not find any circumstances where we would need to deploy this module stand-alone. I bumped the hbase-common dependency to provided (debatable) and removed all references to shading/relocating and a tarball assembly. I rebuilt the project and no longer see the HBaseConfiguration, or any other hbase-common classes for that matter, in the metron-data-management jar. I'm currently undertaking testing from https://github.com/apache/metron/pull/432#issuecomment-276733075 as indicated above, but I think this should solve the root problem without requiring us to modify our license file or eliminate the core Hadoop libs from our shell script classpath.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
