[
https://issues.apache.org/jira/browse/SPARK-21861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16150389#comment-16150389
]
Nikhil Bhide edited comment on SPARK-21861 at 9/1/17 11:43 AM:
---------------------------------------------------------------
Hi Sean,
Please find additional contents as follows. I have added few comments in the
description section (highlighted), and I have slightly modified the example
(highlighted).
Just to summarize :
1. Added details about damping factor & reset probability
2. Added details of Personalized Page Rank Algo supported in Graphx
3. Modified example
- Sorted results in descending order by weights (ranks)
- Added example of PRR
PageRank measures the importance of each vertex in a graph, assuming an edge
from u to v represents an endorsement of v’s importance by u. For example, if a
Twitter user is followed by many others, the user will be ranked
highly.{color:red} *PageRank works by computing number and quality of links to
a node to estimate the importance of a node. *{color}
GraphX comes with static and dynamic implementations of PageRank as methods on
the PageRank object. Static PageRank runs for a fixed number of iterations,
while dynamic PageRank runs until the ranks converge (i.e., stop changing by
more than a specified tolerance). {color:red}Dynamic version of page rank
PageRank$pageRank takes in two parameters tolerance factor and reset
probability, whereas static version of page rank PageRank$staticPageRank takes
in 2 parameters, number of iterations and reset probability. Reset probability
is associated with damping factor, which is click through probability. Page
rank is based on random surfer model, and damping factor is factor by which
surfer would continue visiting different links. Damping factor ranges between 0
and 1. By default, damping factor value is set to 0.85 and random probability
is calculated as 1 – damping factor.{color}
{color:red}GraphX also supports Personalized PageRank (PRR), which is more
general version of page rank. PRR is widely used in recommendation systems. For
example, Twitter uses PRR to present users with other accounts that they may
wish to follow. GraphX provides static and dynamic implementations of
Personalized PageRank methods on PageRank object.
GraphOpsallows calling these algorithms directly as methods on Graph. {color}
import org.apache.spark.graphx.GraphLoader
// Load the edges as a graph
val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
// Run PageRank
val ranks = graph.pageRank(0.0001).vertices
// Join the ranks with the usernames
val users = sc.textFile("data/graphx/users.txt").map { line =>
val fields = line.split(",")
(fields(0).toLong, fields(1))
}
val ranksByUsername = users.join(ranks).map {
case (id, (username, rank)) => (username, rank)
}
// Print the result
* println(ranksByUsername.sortBy({ case (username, rank) => rank },
false).collect().mkString("\n"))
*
* val ranksPRR = graph.personalizedPageRank(graph.vertices.first._1,
0.0001).vertices
val ranksPRRByUsername = users.join(ranksPRR).map {
case (id, (username, rank)) => (username, rank)
}
// Print the result*
* println(ranksPRRByUsername.sortBy({ case (username, rank) => rank },
false).collect().mkString("\n"))
*
was (Author: nikbhi15):
Hi Sean,
Please find additional contents as follows. I have added few comments in the
description section (highlighted), and I have slightly modified the example
(highlighted).
Just to summarize :
1. Added details about damping factor & reset probability
2. Added details of Personalized Page Rank Algo supported in Graphx
3. Modified example
- Sorted results in descending order by weights (ranks)
- Added example of PRR
PageRank measures the importance of each vertex in a graph, assuming an edge
from u to v represents an endorsement of v’s importance by u. For example, if a
Twitter user is followed by many others, the user will be ranked
highly.{color:red} *PageRank works by computing number and quality of links to
a node to estimate the importance of a node. *{color}
GraphX comes with static and dynamic implementations of PageRank as methods on
the PageRank object. Static PageRank runs for a fixed number of iterations,
while dynamic PageRank runs until the ranks converge (i.e., stop changing by
more than a specified tolerance). *Dynamic version of page rank
PageRank$pageRank takes in two parameters tolerance factor and reset
probability, whereas static version of page rank PageRank$staticPageRank takes
in 2 parameters, number of iterations and reset probability. Reset probability
is associated with damping factor, which is click through probability. Page
rank is based on random surfer model, and damping factor is factor by which
surfer would continue visiting different links. Damping factor ranges between 0
and 1. By default, damping factor value is set to 0.85 and random probability
is calculated as 1 – damping factor.*
*Graphx also supports Personalized PageRank (PRR), which is more general
version of page rank. PRR is widely used in recommendation systems. For
example, Twitter uses PRR to present users with other accounts that they may
wish to follow. GraphX provides static and dynamic implementations of
Personalized PageRank methods on PageRank object.
GraphOpsallows calling these algorithms directly as methods on Graph. *
import org.apache.spark.graphx.GraphLoader
// Load the edges as a graph
val graph = GraphLoader.edgeListFile(sc, "data/graphx/followers.txt")
// Run PageRank
val ranks = graph.pageRank(0.0001).vertices
// Join the ranks with the usernames
val users = sc.textFile("data/graphx/users.txt").map { line =>
val fields = line.split(",")
(fields(0).toLong, fields(1))
}
val ranksByUsername = users.join(ranks).map {
case (id, (username, rank)) => (username, rank)
}
// Print the result
* println(ranksByUsername.sortBy({ case (username, rank) => rank },
false).collect().mkString("\n"))
*
* val ranksPRR = graph.personalizedPageRank(graph.vertices.first._1,
0.0001).vertices
val ranksPRRByUsername = users.join(ranksPRR).map {
case (id, (username, rank)) => (username, rank)
}
// Print the result*
* println(ranksPRRByUsername.sortBy({ case (username, rank) => rank },
false).collect().mkString("\n"))
*
> Add more details to PageRank illustration
> -----------------------------------------
>
> Key: SPARK-21861
> URL: https://issues.apache.org/jira/browse/SPARK-21861
> Project: Spark
> Issue Type: Documentation
> Components: Documentation
> Affects Versions: 2.2.0
> Reporter: Nikhil Bhide
> Priority: Trivial
> Labels: documentation
>
> Add more details to PageRank illustration on
> [https://spark.apache.org/docs/latest/graphx-programming-guide.html#pagerank]
> Adding details of page rank algorithm parameters such as dumping factor would
> be pretty much effective. Also, adding more action on result such as sorting
> based on weight would be more helpful.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]