[
https://issues.apache.org/jira/browse/MRESOLVER-228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17463042#comment-17463042
]
wei cai commented on MRESOLVER-228:
-----------------------------------
[~michael-o]
Details of the updated algorithm
https://github.com/apache/maven-resolver/pull/136#issuecomment-998529747
> Improve the maven dependency resolution speed by a skip & reconcile approach
> ----------------------------------------------------------------------------
>
> Key: MRESOLVER-228
> URL: https://issues.apache.org/jira/browse/MRESOLVER-228
> Project: Maven Resolver
> Issue Type: Improvement
> Components: Resolver
> Affects Versions: 1.7.2
> Reporter: wei cai
> Priority: Major
> Fix For: 1.8.0
>
> Attachments: Screen Shot 2021-11-27 at 12.58.26 PM.png, Screen Shot
> 2021-11-27 at 12.58.59 PM.png, Screen Shot 2021-11-27 at 12.59.32 PM.png
>
>
> When comes to resolve the huge amount of dependencies of an enterprise level
> project, the maven resolver is very slow to resolve the dependency
> graph/tree. Take one of our app as example, it could take *10minutes+ and 16G
> memory* to print out the result of {*}mvn dependency:tree{*}.
> This is because there are many dependencies declared in the project, and some
> of the dependencies would introduce *600+* transitive dependencies, and
> exclusions are widely used to solve dependency conflicts.
> By checking the
> [code|https://github.com/apache/maven-resolver/blob/master/maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DefaultDependencyCollector.java#L500],
> we know the exclusion is also part of the cache key. This means when the
> exclusions up the tree differs, the cached resolution result for the same GAV
> won't be picked up and need s to be recalculated.
> !Screen Shot 2021-11-27 at 12.58.26 PM.png!
> From above figure, we know:
> * In 1st case, D will be resolved only once as there are no exclusions/same
> exclusions up the tree.
> * In 2nd case, the B and C have different exclusions and D needs to be
> recalculated, if D is a heavy dependency which introduce many transitive
> dependencies, all D and its children needs to be recalculated. Recalculating
> all of these nodes introduces 2 issues:
> ** Slow in resolving dependencies.
> ** Lots of DependencyNodes cached (all calculated/recalculated nodes would
> be cached) and will consume huge memory.
> To improve the speed of maven resolver's dependency resolution, I
> implemented a skip & reconcile approach. Here is the *skip* part.
> !Screen Shot 2021-11-27 at 12.58.59 PM.png!
> From above figure, the 1st R is resolved at depth 3, and the 2nd R is
> resolved again because the depth is at 2 which is lower, the 3rd R at depth 3
> and the 4th R at depth 4 are simply skipped as R is already resolved at depth
> 2. This is because the same node with deeper depth is most likely won't be
> picked up by maven as maven employs a "{*}nearest{*} transitive dependency in
> the tree depth and the *first* in resolution" strategy.
> The 3rd R and 4th R will have children set as zero and marked as skipped by
> the R at depth 2 in 2nd tree path.
>
> Here is the *reconcile* part:
> !Screen Shot 2021-11-27 at 12.59.32 PM.png!
> When there are dependency conflicts, some of the skipped nodes need to be
> reconciled.
> In above figure, there are 4 tree paths.
> * The D1 (D with version 1) in the 1st tree path is get resolved, children
> of E and R at depth 3 are resolved and cached.
> * In the 2nd tree path, when resolving E & R of H, we simply skip these 2
> nodes as they are in deeper depth (depth: 4) than the E & R in 1st tree path.
> * In the 3rd tree path, a R node with lower path is resolved, and a E node
> at depth 5 is skipped.
> * In the 4th path, a D2 (D with version 2) node is resolved, as the depth is
> lower than D1, so maven will pick D2, this means the E & R's children cached
> in tree depth 1 should be {*}discarded{*}.
> Thus we might need to reconcile the E & R nodes in 2nd, 3rd and 4th tree
> paths. Here only E in 2nd tree path needs to be reconciled. This is because:
> * R in 3rd tree path won't be picked up as there is already a R in 2nd tree
> path with a lower depth.
> * E in 3rd tree path won't be picked up as it is enough to reconcile the E
> in 2nd tree path as the E in 2nd tree path is deeper than E in 3rd tree path.
> Here is what we've updated in the maven-resolver logic:
> * Resolve dependencies by leveraging a skip approach. The node in deeper
> depth will be skipped if a node with same GAV has been resolved with a lower
> depth.
> * Use maven's ConflictResolver (Transformer) to find out the conflict
> winners. Figure out the node that conflict with the winner. Ex, g:a:D1
> conflicts with g:a:D2 in above case.
> * Find out all skipped nodes that is getting affected with D1 as D1 is the
> loser and D2 is the winner.
> * Reconcile all skipped nodes in above step, for nodes with same GAVs, only
> the node with the lowest path will be reconciled.
>
> After we enabled the resolver patch in maven, we are seeing 10% ~70% build
> time reduced for different projects depend on how complex the dependencies
> are, and the result of *mvn dependency:tree* and *mvn dependency:list* remain
> the same.
> We've verified the resolver performance patch leveraging an automation
> solution to certify 2000+ apps of our company by comparing the *mvn
> dependency:tree* and *mvn dependency:list* result with/without the
> performance patch.
> Please help review the PR.
> [https://github.com/apache/maven-resolver/pull/136]
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)