GitHub user okram opened a pull request: https://github.com/apache/tinkerpop/pull/717
TINKERPOP-1783: PageRank gives incorrect results for graphs with sinks https://issues.apache.org/jira/browse/TINKERPOP-1783 There were a couple of problems with `PageRankVertexProgram`. In order to describe the problems, its necessary to explain PageRank. *** Initially, every vertex gets `1.0/vertexCount` amount of energy. Thus, the total energy in the system is 1.0. At every iteration, 0.85 of that energy is distributed amongst adjacent neighbors and 0.15 is distributed amongst all vertices (this is the `alpha` parameter w/ the default being 0.85). If a vertex does not have outgoing edges and it can not distribute its 0.85 energy amongst its neighbors, then all that energy is teleported amongst all vertices. The algorithm above ensures strong connectivity because the 0.15 energy distribution simulates a strongly connected graph via the concept of "teleportation." According to a Markov chain proof on ergodicity in strongly connected graphs, PageRank yields a positive eigenvector. That eigenvector is your rank vector and that rank vector's sum total is always the initial energy the system at iteration 0. *** I fixed three things in this PR: 1. Vertex count is now determined in iteration 0 and no longer needs users to provide a vertex count. 2. Teleportation energy (global energy) is defined via a `Memory` variable. 3. Isolated vertices send their energy to the teleportation variable. Here are the results compared with iGraph's implementation (note that iGraph does a convergence check while TinkerPop does not. It would be extremely expensive to test convergence in a distributed system -- thus, slightly different results). ``` VERTEX iGRAPH TINKERPOP marko 0.1119788 0.11375485828040575 vadas 0.1370267 0.14598540145985406 lop 0.2665600 0.30472082661863686 josh 0.1620746 0.14598540145985406 ripple 0.2103812 0.1757986539008437 peter 0.1119788 0.11375485828040575 ``` Finally, a proof of energy conservation: ``` gremlin> 0.11375485828040575 + 0.14598540145985406 + 0.30472082661863686 + 0.14598540145985406 + 0.1757986539008437 + 0.11375485828040575 ==>1.00000000000000018 ``` VOTE +1 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/tinkerpop TINKERPOP-1783 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/tinkerpop/pull/717.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #717 ---- commit f7e717de2d4b98ca0a49662f545b1b61ed01becd Author: Marko A. Rodriguez <okramma...@gmail.com> Date: 2017-09-15T20:23:10Z added a way to distributed 'terminal energy' throughout the graph via the implicit teleportation matrix in PageRankVertexProgram. Everthing looks good except that I'm not get the appropriate normalization to 1.0 of the resultant vector and I can't seem to figure out why. Committing what I have so far. commit db7d996bd46978b54573d956935e7eea97ef6338 Author: Marko A. Rodriguez <okramma...@gmail.com> Date: 2017-09-18T16:22:06Z finalized worked on PageRankVertexProgram fix. Really cool use of Memory to solve the 'teleporatation problem.' Proud of myself. Updated CHANGELOG and made a note to users about changing PageRank values in UPGRADE docs. ---- ---