Peter Fontana created SPARK-3206:
------------------------------------
Summary: Error in PageRank values
Key: SPARK-3206
URL: https://issues.apache.org/jira/browse/SPARK-3206
Project: Spark
Issue Type: Bug
Components: GraphX
Affects Versions: 1.0.2
Environment: UNIX with Hadoop
Reporter: Peter Fontana
I have found a small example where the PageRank values using run and
runUntilConvergence differ quite a bit.
I am running the Pagerank module on the following graph:
Edge Table:
| Node1 | Node2 |
| ------------- | ------------- |
1 | 2
1 | 3
3 | 2
3 | 4
5 | 3
6 | 7
7 | 8
8 | 9
9 | 7
Node Table (note the extra node):
| NodeID | NodeName |
| ------------- | ------------- |
a | 1
b | 2
c | 3
d | 4
e | 5
f | 6
g | 7
h | 8
i | 9
j.longaddress.com | 10
with a default resetProb of 0.15.
When I compute the pageRank with runUntilConvergence, running val ranks =
PageRank.runUntilConvergence(graph,0.0001).vertices
I get the ranks
(4,0.29503124999999997)
(1,0.15)
(6,0.15)
(3,0.34124999999999994)
(7,1.3299054047985106)
(9,1.2381240056453071)
(8,1.2803346052504254)
(10,0.15)
(5,0.15)
(2,0.35878124999999994)
However, when I run page Rank with the run() method, running val ranksI =
PageRank.run(graph,100).vertices I get the page ranks
(4,0.29503124999999997)
(1,0.15)
(6,0.15)
(3,0.34124999999999994)
(7,0.9999999387662847)
(9,0.9999999256447741)
(8,0.9999999256447741)
(10,0.15)
(5,0.15)
(2,0.29503124999999997)
These are quite different, leading me to suspect that one of the PageRank
methods is incorrect. I have examined the source, but I do not know what the
correct fix is, or which set of values is correct.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]