[ 
https://issues.apache.org/jira/browse/MRESOLVER-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki updated MRESOLVER-93:
---------------------------------
    Description: 
PathRecordingDependencyVisitor cannot handle dependency graphs that have 3 or 
more cycles such as below:
  
{code:java}
gid:a:1 (1)
+- gid:b:0
|  \- ^1
+- gid:b:1
|  \- ^1
\- gid:b:2
   \- ^1
{code}
It fails with StackOverflowError or OutOfMemoryError. [Test 
case|https://github.com/suztomo/maven-resolver/commit/31b24dfe240997861e27661a7540546fbe6e0dab].

 
h1. Solutions

I came up with three solutions. I pick solution #1 for simplicity. 
h2. 1. Use "parents" to check the cycle, rather than visited set

This is the simplest. Checking array element member is usually discouraged 
especially for large data set. The implementation should confirm the overhead 
of this solution.
h2. 2. Use AbstractMapBag for visited set

Creating a new class that extends AbstractMapBag and leverages IdentityHashMap. 
Although this solution would be theoretically more efficient than solution #1, 
I felt it's overkill to create a class just for this solution.

 
{code:java}
AbstractMapBag(new IdentityHashMap<DependencyNode, 
AbstractMapBag.MutableInteger>()){code}
 

[https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/bag/AbstractMapBag.html]
h2. 3. Call visitLeave only when visitEnter is true

The cause of this bug is 
[DefaultDependencyNode|https://github.com/apache/maven-resolver/blob/47edcfe69c4e52ced4cb93d65b7348b5645cdd68/maven-resolver-api/src/main/java/org/eclipse/aether/graph/DefaultDependencyNode.java#L354]
 calling visitLeave regardless of visitEnter result.

I'm not sure how many other visitors rely on visitLeave being called regardless 
of visitEnter result.
h1. Illustration on why existing algorithm does not catch cycle 

The following illustration is the node traversal for the test case above by 
current algorithm. This illustration tracks the dependency node graph and the 
"visited" set maintained by the visitor.
 * visited set. An internal data structure in PathRecordingDependencyVisitor to 
avoid cycle 
([link|https://github.com/apache/maven-resolver/blob/0c2373f6c66f20953b1a7e443ea1de8672d1b072/maven-resolver-util/src/main/java/org/eclipse/aether/util/graph/visitor/PathRecordingDependencyVisitor.java#L45]).
 * visitEnter(node): PathRecordingDependencyVisitor's function 
([link|https://github.com/apache/maven-resolver/blob/0c2373f6c66f20953b1a7e443ea1de8672d1b072/maven-resolver-util/src/main/java/org/eclipse/aether/util/graph/visitor/PathRecordingDependencyVisitor.java#L100]).
 When returning true, the node's children is traversed by the algorithm. This 
function adds the node to visited set.
 * visitLeave(node): PathRecordingDependencyVisitor's function 
([link|https://github.com/apache/maven-resolver/blob/0c2373f6c66f20953b1a7e443ea1de8672d1b072/maven-resolver-util/src/main/java/org/eclipse/aether/util/graph/visitor/PathRecordingDependencyVisitor.java#L129]).
 This function removes the node from visited set.

  

The initial state starts with node "a" and visited set \{a}.

!IMG_0234.jpg|width=334,height=252!

First child of a is b0. Because visited does not contain, visitEnter(b0) 
returns true, meaning that the algorithm traverses this b0's children next. B0 
is added to visited.

!IMG_0235.jpg|width=359,height=191!

B0's children is "a". Because visited set contains "a", visitEnter(a) returns 
false. This means that the algorithm does not traverse this "a"'s children. A 
is added to visited set (already it has).

  !IMG_0236.jpg|width=438,height=197!

Now not traversing this "a"'s children, the algorithm calls visitLeave(a). This 
removes "a" from visited set.

!IMG_0237.jpg|width=434,height=165!

B0's children are all traversed. the algorithm calls visitLeave(b0). This 
removes "b0" from visited set.

!IMG_0238.jpg|width=459,height=197!

Now visited set is empty.

Next child of the root "a" is b1. B1 is not in visited set, thus visitEnter(b1) 
returns true. This means the algorithm traverses the children of this b1.

!IMG_0240.jpg|width=445,height=270!

B1's only child is a. "a" is not in visited set. visitEnter(a) returns true. 
This means to traverse "a"'s children.

!IMG_0241.jpg|width=418,height=262!

A's first children is b0. b0 is not in visited set. visitEnter(b0) returns 
true, meaning to traverse children of this b0.

!IMG_0242.jpg|width=422,height=208!  
 (img 0242)

The only child of b0 is "a". Visited set contains "a", and thus not traversing 
its children.

!IMG_0243.jpg|width=491,height=191!

visitLeave(a) removes "a" from visited set.

!IMG_0244.jpg|width=481,height=189!

b0's children is all traversed. VisitLeave(b0) removes b0 from visited set.

!IMG_0245.jpg|width=498,height=182!

Next child of this "a" is b1. B1 is in visited set, and thus visitEnter(b1) 
returns false. This node's children is not to be traversed.

!IMG_0255.jpg|width=545,height=245!

(img 0255)

visitLeave(b1) removes b1 from visited set. Now visited is emtpy.

!IMG_0256.jpg|width=528,height=294!

The last child of "a" is b2. VisitEnter(b2) returns true. It's children is to 
be traversed. B2 is in visited set.

!IMG_0257.jpg|width=502,height=309!

 B2's only child is "a". "a" is not in visited set, thus visitEnter(a) returns 
true. The algorithm traverses this "a"'s children.

!IMG_0258.jpg|width=485,height=299!

(img 0258)

 

(...omit...)

 

IMG_0266 shows the step where I decided to give up. The algorithm does not seem 
to stop. Indeed the test shows that. The path from the root to the furthest a 
includes 5 "a" nodes. I concluded the visited set is not working as expected to 
avoid cycle.

!IMG_0266.jpg|width=656,height=252!

 

 

  was:
PathRecordingDependencyVisitor cannot handle dependency graphs that have 3 or 
more cycles such as below:
  
{code:java}
gid:a:1 (1)
+- gid:b:0
|  \- ^1
+- gid:b:1
|  \- ^1
\- gid:b:2
   \- ^1
{code}
It fails with StackOverflowError or OutOfMemoryError. [Test 
case|https://github.com/suztomo/maven-resolver/commit/31b24dfe240997861e27661a7540546fbe6e0dab].

 
h1. Illustration on why existing algorithm does not catch cycle 

The following illustration is the node traversal for the test case above by 
current algorithm. This illustration tracks the dependency node graph and the 
"visited" set maintained by the visitor.
 * visited set. An internal data structure in PathRecordingDependencyVisitor to 
avoid cycle 
([link|https://github.com/apache/maven-resolver/blob/0c2373f6c66f20953b1a7e443ea1de8672d1b072/maven-resolver-util/src/main/java/org/eclipse/aether/util/graph/visitor/PathRecordingDependencyVisitor.java#L45]).
 * visitEnter(node): PathRecordingDependencyVisitor's function 
([link|https://github.com/apache/maven-resolver/blob/0c2373f6c66f20953b1a7e443ea1de8672d1b072/maven-resolver-util/src/main/java/org/eclipse/aether/util/graph/visitor/PathRecordingDependencyVisitor.java#L100]).
 When returning true, the node's children is traversed by the algorithm. This 
function adds the node to visited set.
 * visitLeave(node): PathRecordingDependencyVisitor's function 
([link|https://github.com/apache/maven-resolver/blob/0c2373f6c66f20953b1a7e443ea1de8672d1b072/maven-resolver-util/src/main/java/org/eclipse/aether/util/graph/visitor/PathRecordingDependencyVisitor.java#L129]).
 This function removes the node from visited set.

  

The initial state starts with node "a" and visited set \{a}.

!IMG_0234.jpg|width=334,height=252!

First child of a is b0. Because visited does not contain, visitEnter(b0) 
returns true, meaning that the algorithm traverses this b0's children next. B0 
is added to visited.

!IMG_0235.jpg|width=359,height=191!

B0's children is "a". Because visited set contains "a", visitEnter(a) returns 
false. This means that the algorithm does not traverse this "a"'s children. A 
is added to visited set (already it has).

  !IMG_0236.jpg|width=438,height=197!

Now not traversing this "a"'s children, the algorithm calls visitLeave(a). This 
removes "a" from visited set.

!IMG_0237.jpg|width=434,height=165!

B0's children are all traversed. the algorithm calls visitLeave(b0). This 
removes "b0" from visited set.

!IMG_0238.jpg|width=459,height=197!

Now visited set is empty.

Next child of the root "a" is b1. B1 is not in visited set, thus visitEnter(b1) 
returns true. This means the algorithm traverses the children of this b1.

!IMG_0240.jpg|width=445,height=270!

B1's only child is a. "a" is not in visited set. visitEnter(a) returns true. 
This means to traverse "a"'s children.

!IMG_0241.jpg|width=418,height=262!

A's first children is b0. b0 is not in visited set. visitEnter(b0) returns 
true, meaning to traverse children of this b0.

!IMG_0242.jpg|width=422,height=208!  
 (img 0242)

The only child of b0 is "a". Visited set contains "a", and thus not traversing 
its children.

!IMG_0243.jpg|width=491,height=191!

visitLeave(a) removes "a" from visited set.

!IMG_0244.jpg|width=481,height=189!

b0's children is all traversed. VisitLeave(b0) removes b0 from visited set.

!IMG_0245.jpg|width=498,height=182!

Next child of this "a" is b1. B1 is in visited set, and thus visitEnter(b1) 
returns false. This node's children is not to be traversed.

!IMG_0255.jpg|width=545,height=245!

(img 0255)

visitLeave(b1) removes b1 from visited set. Now visited is emtpy.

!IMG_0256.jpg|width=528,height=294!

The last child of "a" is b2. VisitEnter(b2) returns true. It's children is to 
be traversed. B2 is in visited set.

!IMG_0257.jpg|width=502,height=309!

 B2's only child is "a". "a" is not in visited set, thus visitEnter(a) returns 
true. The algorithm traverses this "a"'s children.

!IMG_0258.jpg|width=485,height=299!

(img 0258)

 

(...omit...)

 

IMG_0266 shows the step where I decided to give up. The algorithm does not seem 
to stop. Indeed the test shows that. The path from the root to the furthest a 
includes 5 "a" nodes. I concluded the visited set is not working as expected to 
avoid cycle.

!IMG_0266.jpg|width=656,height=252!

 

 


> PathRecordingDependencyVisitor to handle 3 cycles
> -------------------------------------------------
>
>                 Key: MRESOLVER-93
>                 URL: https://issues.apache.org/jira/browse/MRESOLVER-93
>             Project: Maven Resolver
>          Issue Type: Improvement
>          Components: resolver
>    Affects Versions: 1.4.1
>            Reporter: Tomo Suzuki
>            Priority: Major
>         Attachments: IMG_0234.jpg, IMG_0235.jpg, IMG_0236.jpg, IMG_0237.jpg, 
> IMG_0238.jpg, IMG_0240.jpg, IMG_0241.jpg, IMG_0242.jpg, IMG_0243.jpg, 
> IMG_0244.jpg, IMG_0245.jpg, IMG_0255.jpg, IMG_0256.jpg, IMG_0257.jpg, 
> IMG_0258.jpg, IMG_0259.jpg, IMG_0260.jpg, IMG_0261.jpg, IMG_0262.jpg, 
> IMG_0263.jpg, IMG_0264.jpg, IMG_0265.jpg, IMG_0266.jpg
>
>
> PathRecordingDependencyVisitor cannot handle dependency graphs that have 3 or 
> more cycles such as below:
>   
> {code:java}
> gid:a:1 (1)
> +- gid:b:0
> |  \- ^1
> +- gid:b:1
> |  \- ^1
> \- gid:b:2
>    \- ^1
> {code}
> It fails with StackOverflowError or OutOfMemoryError. [Test 
> case|https://github.com/suztomo/maven-resolver/commit/31b24dfe240997861e27661a7540546fbe6e0dab].
>  
> h1. Solutions
> I came up with three solutions. I pick solution #1 for simplicity. 
> h2. 1. Use "parents" to check the cycle, rather than visited set
> This is the simplest. Checking array element member is usually discouraged 
> especially for large data set. The implementation should confirm the overhead 
> of this solution.
> h2. 2. Use AbstractMapBag for visited set
> Creating a new class that extends AbstractMapBag and leverages 
> IdentityHashMap. Although this solution would be theoretically more efficient 
> than solution #1, I felt it's overkill to create a class just for this 
> solution.
>  
> {code:java}
> AbstractMapBag(new IdentityHashMap<DependencyNode, 
> AbstractMapBag.MutableInteger>()){code}
>  
> [https://commons.apache.org/proper/commons-collections/apidocs/org/apache/commons/collections4/bag/AbstractMapBag.html]
> h2. 3. Call visitLeave only when visitEnter is true
> The cause of this bug is 
> [DefaultDependencyNode|https://github.com/apache/maven-resolver/blob/47edcfe69c4e52ced4cb93d65b7348b5645cdd68/maven-resolver-api/src/main/java/org/eclipse/aether/graph/DefaultDependencyNode.java#L354]
>  calling visitLeave regardless of visitEnter result.
> I'm not sure how many other visitors rely on visitLeave being called 
> regardless of visitEnter result.
> h1. Illustration on why existing algorithm does not catch cycle 
> The following illustration is the node traversal for the test case above by 
> current algorithm. This illustration tracks the dependency node graph and the 
> "visited" set maintained by the visitor.
>  * visited set. An internal data structure in PathRecordingDependencyVisitor 
> to avoid cycle 
> ([link|https://github.com/apache/maven-resolver/blob/0c2373f6c66f20953b1a7e443ea1de8672d1b072/maven-resolver-util/src/main/java/org/eclipse/aether/util/graph/visitor/PathRecordingDependencyVisitor.java#L45]).
>  * visitEnter(node): PathRecordingDependencyVisitor's function 
> ([link|https://github.com/apache/maven-resolver/blob/0c2373f6c66f20953b1a7e443ea1de8672d1b072/maven-resolver-util/src/main/java/org/eclipse/aether/util/graph/visitor/PathRecordingDependencyVisitor.java#L100]).
>  When returning true, the node's children is traversed by the algorithm. This 
> function adds the node to visited set.
>  * visitLeave(node): PathRecordingDependencyVisitor's function 
> ([link|https://github.com/apache/maven-resolver/blob/0c2373f6c66f20953b1a7e443ea1de8672d1b072/maven-resolver-util/src/main/java/org/eclipse/aether/util/graph/visitor/PathRecordingDependencyVisitor.java#L129]).
>  This function removes the node from visited set.
>   
> The initial state starts with node "a" and visited set \{a}.
> !IMG_0234.jpg|width=334,height=252!
> First child of a is b0. Because visited does not contain, visitEnter(b0) 
> returns true, meaning that the algorithm traverses this b0's children next. 
> B0 is added to visited.
> !IMG_0235.jpg|width=359,height=191!
> B0's children is "a". Because visited set contains "a", visitEnter(a) returns 
> false. This means that the algorithm does not traverse this "a"'s children. A 
> is added to visited set (already it has).
>   !IMG_0236.jpg|width=438,height=197!
> Now not traversing this "a"'s children, the algorithm calls visitLeave(a). 
> This removes "a" from visited set.
> !IMG_0237.jpg|width=434,height=165!
> B0's children are all traversed. the algorithm calls visitLeave(b0). This 
> removes "b0" from visited set.
> !IMG_0238.jpg|width=459,height=197!
> Now visited set is empty.
> Next child of the root "a" is b1. B1 is not in visited set, thus 
> visitEnter(b1) returns true. This means the algorithm traverses the children 
> of this b1.
> !IMG_0240.jpg|width=445,height=270!
> B1's only child is a. "a" is not in visited set. visitEnter(a) returns true. 
> This means to traverse "a"'s children.
> !IMG_0241.jpg|width=418,height=262!
> A's first children is b0. b0 is not in visited set. visitEnter(b0) returns 
> true, meaning to traverse children of this b0.
> !IMG_0242.jpg|width=422,height=208!  
>  (img 0242)
> The only child of b0 is "a". Visited set contains "a", and thus not 
> traversing its children.
> !IMG_0243.jpg|width=491,height=191!
> visitLeave(a) removes "a" from visited set.
> !IMG_0244.jpg|width=481,height=189!
> b0's children is all traversed. VisitLeave(b0) removes b0 from visited set.
> !IMG_0245.jpg|width=498,height=182!
> Next child of this "a" is b1. B1 is in visited set, and thus visitEnter(b1) 
> returns false. This node's children is not to be traversed.
> !IMG_0255.jpg|width=545,height=245!
> (img 0255)
> visitLeave(b1) removes b1 from visited set. Now visited is emtpy.
> !IMG_0256.jpg|width=528,height=294!
> The last child of "a" is b2. VisitEnter(b2) returns true. It's children is to 
> be traversed. B2 is in visited set.
> !IMG_0257.jpg|width=502,height=309!
>  B2's only child is "a". "a" is not in visited set, thus visitEnter(a) 
> returns true. The algorithm traverses this "a"'s children.
> !IMG_0258.jpg|width=485,height=299!
> (img 0258)
>  
> (...omit...)
>  
> IMG_0266 shows the step where I decided to give up. The algorithm does not 
> seem to stop. Indeed the test shows that. The path from the root to the 
> furthest a includes 5 "a" nodes. I concluded the visited set is not working 
> as expected to avoid cycle.
> !IMG_0266.jpg|width=656,height=252!
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Reply via email to