merrimanr commented on issue #1436: METRON-2149: Shaded jar classifier is not 
consistent
URL: https://github.com/apache/metron/pull/1436#issuecomment-499591873
 
 
   As I worked through resolving the final failing test I realized there is a 
use case that is not properly handled by the original changes in this PR.  
There are 2 versions of guava required by different classes (Stellar and HBase 
testing utility) so we need a way to relocate one of them.  It's not possible 
to do this without depending on a shaded module because transitive dependencies 
have already been resolved (meaning only 1 version remains) by the time the 
final shaded jar is built.  Relocating at this point just relocates the single 
remaining version.
   
   At this point I want to summarize my findings and present some options.  
Here are the requirements I see:
   
   1. Transitive dependency resolution should be predictable and easy to 
troubleshoot.  Maven configuration settings (excludes, etc) should work as 
expected.
   2. Versions reported by mvn dependency:tree should match what's included in 
the uber jar.
   3. There should be a well understood and robust strategy for relocating 
classes that conflict.
   
   Using classifiers on all modules solves 1 and 2 but does not support 3 as 
described above.  We currently support 3 but not 1 and 2 because lower level 
modules (like metron-common) do not use a classifier.  This means other modules 
that depend on it inherit relocated classes.  The problem is that transitive 
dependencies from these modules overwrite other dependencies, making it harder 
to determine which versions end up in the final uber jar.
   
   To get to a point where we can satisfy all 3 requirements above, I can think 
of a couple options.  Both of these options are based on the assumption that 
most class version conflicts involve Stellar classes.  Stellar contains most of 
our business logic and contains a long list of dependencies, including several 
that commonly conflict with other projects (guava, log4j, jackson, etc).  The 
idea behind both of these options involves isolating Stellar from the rest of 
the project.  Here they are:
   
   1. Make Stellar the exception and remove the classifier on stellar-common.  
This module would be the only one that does this.  This satisfies 3 as long as 
code requiring different class versions is located in this module.  This means 
we may need to move classes into this module (or do this with other modules 
too).  To satisfy 1 and 2 we would need to ensure we are rewriting ALL 
transitive dependencies or tolerate relocating classes as we run into issues.  
The advantage of this approach is there would still be a single uber jar so 
changes to scripts and classpath setup would not change.  The disadvantage is 
there is still the risk of transitive dependencies leaking into the main uber 
jar.
   
   2. Deploy Stellar code in a separate jar and add it to classpath after the 
main uber jar, whatever that is (metron-data-management, 
metron-enrichment-storm, etc).  This satisfies 3 because the separate Stellar 
jar can contain the relocated classes but other dependencies will not overwrite 
dependencies of the main uber jar (because it's listed after the main uber 
jar).  1 and 2 are not a concern when classifiers are used which is the case 
here.  The main disadvantage I see is that there will be work adding this extra 
jar in the various scripts or startup options and we may have to reorganize 
some classes.
   
   I tested both options and was able to get both working for the these use 
cases:
   
   - Generate a bloom filter and read it back 
(https://metron.apache.org/current-book/use-cases/typosquat_detection/index.html)
   - Enrichment and parser topology regression test
     
    This is all fairly complex so if anything is not clear I can elaborate.  
Are there other options I'm not thinking of?  Thoughts?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to