[ 
https://issues.apache.org/jira/browse/MRESOLVER-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tibor Digana updated MRESOLVER-93:
----------------------------------
    Summary: PathRecordingDependencyVisitor to handle 3 cycles  (was: 
StackOverflowError  by PathRecordingDependencyVisitor)

> PathRecordingDependencyVisitor to handle 3 cycles
> -------------------------------------------------
>
>                 Key: MRESOLVER-93
>                 URL: https://issues.apache.org/jira/browse/MRESOLVER-93
>             Project: Maven Resolver
>          Issue Type: Improvement
>          Components: resolver
>    Affects Versions: 1.3.3, 1.4.0, 1.4.1
>            Reporter: Tomo Suzuki
>            Priority: Major
>             Fix For: 1.4.2
>
>         Attachments: IMG_0234.jpg, IMG_0235.jpg, IMG_0236.jpg, IMG_0237.jpg, 
> IMG_0238.jpg, IMG_0240.jpg, IMG_0241.jpg, IMG_0242.jpg, IMG_0243.jpg, 
> IMG_0244.jpg, IMG_0245.jpg, IMG_0255.jpg, IMG_0256.jpg, IMG_0257.jpg, 
> IMG_0258.jpg, IMG_0259.jpg, IMG_0260.jpg, IMG_0261.jpg, IMG_0262.jpg, 
> IMG_0263.jpg, IMG_0264.jpg, IMG_0265.jpg, IMG_0266.jpg
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> PathRecordingDependencyVisitor cannot handle dependency graphs that have 3 or 
> more cycles such as below:
>   
> {code:java}
> gid:a:1 (1)
> +- gid:b:0
> |  \- ^1
> +- gid:b:1
> |  \- ^1
> \- gid:b:2
>    \- ^1
> {code}
> It fails with StackOverflowError or OutOfMemoryError. [Test 
> case|https://github.com/suztomo/maven-resolver/commit/31b24dfe240997861e27661a7540546fbe6e0dab].
>  
> h1. Solutions
> I came up with three solutions. I pick solution #1 for simplicity. 
> h2. 1. Use "parents" to check the cycle, rather than visited set
> This is the simplest. Checking array element member is usually discouraged 
> especially for large data set. The implementation should confirm the overhead 
> of this solution.
> h2. 2. Use AbstractMapBag/Multiset for visited set
> Creating a new class that extends AbstractMapBag and leverages 
> IdentityHashMap. Although this solution would be theoretically more efficient 
> than solution #1, I felt it's overkill to create a class just for this 
> solution.
> {code:java}
> AbstractMapBag(new IdentityHashMap<DependencyNode, 
> AbstractMapBag.MutableInteger>()){code}
>  
> [https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/bag/AbstractMapBag.html]
>  
> IdentityHashMap<DependencyNode, Integer>() would work as a multiset.
> h2. 3. Call visitLeave only when visitEnter is true
> The cause of this bug is 
> [DefaultDependencyNode|https://github.com/apache/maven-resolver/blob/47edcfe69c4e52ced4cb93d65b7348b5645cdd68/maven-resolver-api/src/main/java/org/eclipse/aether/graph/DefaultDependencyNode.java#L354]
>  calling visitLeave regardless of visitEnter result.
> I'm not sure how many other visitors rely on visitLeave being called 
> regardless of visitEnter result.
> h1. Illustration on why existing algorithm does not catch cycle 
> The following illustration is the node traversal for the test case above by 
> current algorithm. This illustration tracks the dependency node graph and the 
> "visited" set maintained by the visitor.
>  * visited set. An internal data structure in PathRecordingDependencyVisitor 
> to avoid cycle 
> ([link|https://github.com/apache/maven-resolver/blob/0c2373f6c66f20953b1a7e443ea1de8672d1b072/maven-resolver-util/src/main/java/org/eclipse/aether/util/graph/visitor/PathRecordingDependencyVisitor.java#L45]).
>  * visitEnter(node): PathRecordingDependencyVisitor's function 
> ([link|https://github.com/apache/maven-resolver/blob/0c2373f6c66f20953b1a7e443ea1de8672d1b072/maven-resolver-util/src/main/java/org/eclipse/aether/util/graph/visitor/PathRecordingDependencyVisitor.java#L100]).
>  When returning true, the node's children is traversed by the algorithm. This 
> function adds the node to visited set.
>  * visitLeave(node): PathRecordingDependencyVisitor's function 
> ([link|https://github.com/apache/maven-resolver/blob/0c2373f6c66f20953b1a7e443ea1de8672d1b072/maven-resolver-util/src/main/java/org/eclipse/aether/util/graph/visitor/PathRecordingDependencyVisitor.java#L129]).
>  This function removes the node from visited set.
>   
> The initial state starts with node "a" and visited set \{a}.
> !IMG_0234.jpg|width=334,height=252!
> First child of a is b0. Because visited does not contain, visitEnter(b0) 
> returns true, meaning that the algorithm traverses this b0's children next. 
> B0 is added to visited.
> !IMG_0235.jpg|width=359,height=191!
> B0's children is "a". Because visited set contains "a", visitEnter(a) returns 
> false. This means that the algorithm does not traverse this "a"'s children. A 
> is added to visited set (already it has).
>   !IMG_0236.jpg|width=438,height=197!
> Now not traversing this "a"'s children, the algorithm calls visitLeave(a). 
> This removes "a" from visited set.
> !IMG_0237.jpg|width=434,height=165!
> B0's children are all traversed. the algorithm calls visitLeave(b0). This 
> removes "b0" from visited set.
> !IMG_0238.jpg|width=459,height=197!
> Now visited set is empty.
> Next child of the root "a" is b1. B1 is not in visited set, thus 
> visitEnter(b1) returns true. This means the algorithm traverses the children 
> of this b1.
> !IMG_0240.jpg|width=445,height=270!
> B1's only child is a. "a" is not in visited set. visitEnter(a) returns true. 
> This means to traverse "a"'s children.
> !IMG_0241.jpg|width=418,height=262!
> A's first children is b0. b0 is not in visited set. visitEnter(b0) returns 
> true, meaning to traverse children of this b0.
> !IMG_0242.jpg|width=422,height=208!  
>  (img 0242)
> The only child of b0 is "a". Visited set contains "a", and thus not 
> traversing its children.
> !IMG_0243.jpg|width=491,height=191!
> visitLeave(a) removes "a" from visited set.
> !IMG_0244.jpg|width=481,height=189!
> b0's children is all traversed. VisitLeave(b0) removes b0 from visited set.
> !IMG_0245.jpg|width=498,height=182!
> Next child of this "a" is b1. B1 is in visited set, and thus visitEnter(b1) 
> returns false. This node's children is not to be traversed.
> !IMG_0255.jpg|width=545,height=245!
> (img 0255)
> visitLeave(b1) removes b1 from visited set. Now visited is emtpy.
> !IMG_0256.jpg|width=528,height=294!
> The last child of "a" is b2. VisitEnter(b2) returns true. It's children is to 
> be traversed. B2 is in visited set.
> !IMG_0257.jpg|width=502,height=309!
>  B2's only child is "a". "a" is not in visited set, thus visitEnter(a) 
> returns true. The algorithm traverses this "a"'s children.
> !IMG_0258.jpg|width=485,height=299!
> (img 0258)
>  
> (...omit...)
>  
> IMG_0266 shows the step where I decided to give up. The algorithm does not 
> seem to stop. Indeed the test shows that. The path from the root to the 
> furthest a includes 5 "a" nodes. I concluded the visited set is not working 
> as expected to avoid cycle.
> !IMG_0266.jpg|width=656,height=252!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to