[ 
https://issues.apache.org/jira/browse/TINKERPOP-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16170521#comment-16170521
 ] 

ASF GitHub Bot commented on TINKERPOP-1783:
-------------------------------------------

GitHub user okram opened a pull request:

    https://github.com/apache/tinkerpop/pull/717

    TINKERPOP-1783: PageRank gives incorrect results for graphs with sinks

    https://issues.apache.org/jira/browse/TINKERPOP-1783
    
    There were a couple of problems with `PageRankVertexProgram`. In order to 
describe the problems, its necessary to explain PageRank.
    
    ***
    
    Initially, every vertex gets `1.0/vertexCount` amount of energy. Thus, the 
total energy in the system is 1.0.
    
    At every iteration, 0.85 of that energy is distributed amongst adjacent 
neighbors and 0.15 is distributed amongst all vertices (this is the `alpha` 
parameter w/ the default being 0.85). If a vertex does not have outgoing edges 
and it can not distribute its 0.85 energy amongst its neighbors, then all that 
energy is teleported amongst all vertices.
    
    The algorithm above ensures strong connectivity because the 0.15 energy 
distribution simulates a strongly connected graph via the concept of 
"teleportation." According to a Markov chain proof on ergodicity in strongly 
connected graphs, PageRank yields a positive eigenvector. That eigenvector is 
your rank vector and that rank vector's sum total is always the initial energy 
the system at iteration 0.
    
    ***
    
    I fixed three things in this PR:
    
    1. Vertex count is now determined in iteration 0 and no longer needs users 
to provide a vertex count.
    2. Teleportation energy (global energy) is defined via a `Memory` variable.
    3. Isolated vertices send their energy to the teleportation variable.
    
    Here are the results compared with iGraph's implementation (note that 
iGraph does a convergence check while TinkerPop does not. It would be extremely 
expensive to test convergence in a distributed system -- thus, slightly 
different results).
    
    ```
    VERTEX  iGRAPH    TINKERPOP
    marko   0.1119788 0.11375485828040575
    vadas   0.1370267 0.14598540145985406
    lop     0.2665600 0.30472082661863686
    josh    0.1620746 0.14598540145985406
    ripple  0.2103812 0.1757986539008437
    peter   0.1119788 0.11375485828040575
    ```
    
    Finally, a proof of energy conservation:
    
    ```
    gremlin> 0.11375485828040575 + 0.14598540145985406 + 0.30472082661863686 + 
0.14598540145985406 + 0.1757986539008437 + 0.11375485828040575
    ==>1.00000000000000018
    ```
    
    VOTE +1


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/tinkerpop TINKERPOP-1783

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tinkerpop/pull/717.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #717
    
----
commit f7e717de2d4b98ca0a49662f545b1b61ed01becd
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2017-09-15T20:23:10Z

    added a way to distributed 'terminal energy' throughout the graph via the 
implicit teleportation matrix in PageRankVertexProgram. Everthing looks good 
except that I'm not get the appropriate normalization to 1.0 of the resultant 
vector and I can't seem to figure out why. Committing what I have so far.

commit db7d996bd46978b54573d956935e7eea97ef6338
Author: Marko A. Rodriguez <okramma...@gmail.com>
Date:   2017-09-18T16:22:06Z

    finalized worked on PageRankVertexProgram fix. Really cool use of Memory to 
solve the 'teleporatation problem.' Proud of myself. Updated CHANGELOG and made 
a note to users about changing PageRank values in UPGRADE docs.

----


> PageRank gives incorrect results for graphs with sinks
> ------------------------------------------------------
>
>                 Key: TINKERPOP-1783
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1783
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: process
>    Affects Versions: 3.3.0, 3.1.8, 3.2.6
>            Reporter: Artem Aliev
>            Assignee: Marko A. Rodriguez
>
> {quote} Sink vertices (those with no outgoing edges) should evenly distribute 
> their rank to the entire graph but in the current implementation it is just 
> lost.
> {quote} 
> Wiki: https://en.wikipedia.org/wiki/PageRank#Simplified_algorithm
> {quote}  In the original form of PageRank, the sum of PageRank over all pages 
> was the total number of pages on the web at that time
> {quote} 
> I found the issue, while comparing results with the spark graphX.
> So this is a copy of  https://issues.apache.org/jira/browse/SPARK-18847
> How to reproduce:
> {code}
> gremlin> graph = TinkerFactory.createModern()
> gremlin> g = graph.traversal().withComputer()
> gremlin> 
> g.V().pageRank(0.85).times(40).by('pageRank').values('pageRank').sum()
> ==>1.318625
> gremlin> g.V().pageRank(0.85).times(1).by('pageRank').values('pageRank').sum()
> ==>3.4499999999999997
> #inital values:
> gremlin> g.V().pageRank(0.85).times(0).by('pageRank').values('pageRank').sum()
> ==>6.0
> {code}
> They fixed the issue by normalising values after each step.
> The other way to fix is to send the message to it self (stay on the same 
> page).
> To workaround the problem just add self pointing edges:
> {code}
> gremlin>g.V().as('B').addE('knows').from('B')
> {code}
> Then you'll get always correct sum. But I'm not sure it is a proper 
> assumption. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to