[
https://issues.apache.org/jira/browse/SPARK-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Peter Fontana updated SPARK-3206:
---------------------------------
Description:
I have found a small example where the PageRank values using run and
runUntilConvergence differ quite a bit.
I am running the Pagerank module on the following graph:
Edge Table:
|| Node1 || Node2 ||
|1 | 2 |
|1 | 3|
|3 | 2|
|3 | 4|
|5 | 3|
|6 | 7|
|7 | 8|
|8 | 9|
|9 | 7|
Node Table (note the extra node):
|| NodeID || NodeName ||
|a | 1|
|b | 2|
|c | 3|
|d | 4|
|e | 5|
|f | 6|
|g | 7|
|h | 8|
|i | 9|
|j.longaddress.com | 10|
with a default resetProb of 0.15.
When I compute the pageRank with runUntilConvergence, running
{{val ranks = PageRank.runUntilConvergence(graph,0.0001).vertices}}
I get the ranks
(4,0.29503124999999997)
(1,0.15)
(6,0.15)
(3,0.34124999999999994)
(7,1.3299054047985106)
(9,1.2381240056453071)
(8,1.2803346052504254)
(10,0.15)
(5,0.15)
(2,0.35878124999999994)
However, when I run page Rank with the run() method, running
{{val ranksI = PageRank.run(graph,100).vertices}}
I get the page ranks
(4,0.29503124999999997)
(1,0.15)
(6,0.15)
(3,0.34124999999999994)
(7,0.9999999387662847)
(9,0.9999999256447741)
(8,0.9999999256447741)
(10,0.15)
(5,0.15)
(2,0.29503124999999997)
These are quite different, leading me to suspect that one of the PageRank
methods is incorrect. I have examined the source, but I do not know what the
correct fix is, or which set of values is correct.
was:
I have found a small example where the PageRank values using run and
runUntilConvergence differ quite a bit.
I am running the Pagerank module on the following graph:
Edge Table:
| |Node1|||Node2 | |
|1 | 2 |
|1 | 3|
|3 | 2|
|3 | 4|
|5 | 3|
|6 | 7|
|7 | 8|
|8 | 9|
|9 | 7|
Node Table (note the extra node):
|| NodeID || NodeName ||
|a | 1|
|b | 2|
|c | 3|
|d | 4|
|e | 5|
|f | 6|
|g | 7|
|h | 8|
|i | 9|
|j.longaddress.com | 10|
with a default resetProb of 0.15.
When I compute the pageRank with runUntilConvergence, running
{{val ranks = PageRank.runUntilConvergence(graph,0.0001).vertices}}
I get the ranks
(4,0.29503124999999997)
(1,0.15)
(6,0.15)
(3,0.34124999999999994)
(7,1.3299054047985106)
(9,1.2381240056453071)
(8,1.2803346052504254)
(10,0.15)
(5,0.15)
(2,0.35878124999999994)
However, when I run page Rank with the run() method, running
{{val ranksI = PageRank.run(graph,100).vertices}}
I get the page ranks
(4,0.29503124999999997)
(1,0.15)
(6,0.15)
(3,0.34124999999999994)
(7,0.9999999387662847)
(9,0.9999999256447741)
(8,0.9999999256447741)
(10,0.15)
(5,0.15)
(2,0.29503124999999997)
These are quite different, leading me to suspect that one of the PageRank
methods is incorrect. I have examined the source, but I do not know what the
correct fix is, or which set of values is correct.
> Error in PageRank values
> ------------------------
>
> Key: SPARK-3206
> URL: https://issues.apache.org/jira/browse/SPARK-3206
> Project: Spark
> Issue Type: Bug
> Components: GraphX
> Affects Versions: 1.0.2
> Environment: UNIX with Hadoop
> Reporter: Peter Fontana
>
> I have found a small example where the PageRank values using run and
> runUntilConvergence differ quite a bit.
> I am running the Pagerank module on the following graph:
> Edge Table:
> || Node1 || Node2 ||
> |1 | 2 |
> |1 | 3|
> |3 | 2|
> |3 | 4|
> |5 | 3|
> |6 | 7|
> |7 | 8|
> |8 | 9|
> |9 | 7|
> Node Table (note the extra node):
> || NodeID || NodeName ||
> |a | 1|
> |b | 2|
> |c | 3|
> |d | 4|
> |e | 5|
> |f | 6|
> |g | 7|
> |h | 8|
> |i | 9|
> |j.longaddress.com | 10|
> with a default resetProb of 0.15.
> When I compute the pageRank with runUntilConvergence, running
> {{val ranks = PageRank.runUntilConvergence(graph,0.0001).vertices}}
> I get the ranks
> (4,0.29503124999999997)
> (1,0.15)
> (6,0.15)
> (3,0.34124999999999994)
> (7,1.3299054047985106)
> (9,1.2381240056453071)
> (8,1.2803346052504254)
> (10,0.15)
> (5,0.15)
> (2,0.35878124999999994)
> However, when I run page Rank with the run() method, running
> {{val ranksI = PageRank.run(graph,100).vertices}}
> I get the page ranks
> (4,0.29503124999999997)
> (1,0.15)
> (6,0.15)
> (3,0.34124999999999994)
> (7,0.9999999387662847)
> (9,0.9999999256447741)
> (8,0.9999999256447741)
> (10,0.15)
> (5,0.15)
> (2,0.29503124999999997)
> These are quite different, leading me to suspect that one of the PageRank
> methods is incorrect. I have examined the source, but I do not know what the
> correct fix is, or which set of values is correct.
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]