[ 
https://issues.apache.org/jira/browse/FLINK-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14634839#comment-14634839
 ] 

ASF GitHub Bot commented on FLINK-2310:
---------------------------------------

Github user vasia commented on the pull request:

    https://github.com/apache/flink/pull/892#issuecomment-123236061
  
    Hi @shghatge,
    
    let me try to explain the implementation in detail here :)
    
    First of all, can you change this to a library method instead of an 
example? The modification should be very easy. You only need to move the class 
in the `org.apache.flink.graph.library` package and implement the 
`GraphAlgorithm` interface. This way, users will be able to use this method by 
simply calling `graph.run(new AdamicAdarSimilarity())`.
    
    Making this a library method also means that we don't have to use only 
Gelly methods, i.e. we can do a few things more efficiently.
    
    For example, in the beginning of the algorithm, you need to compute (1) the 
vertex "weights" and (2) the neighbor IDs for each vertex. Both these 
computations can be done with a single GroupReduce on the edges dataset, i.e. 
`edges.flatMap(...).groupBy(0).reduceGroup(...)`, where in the flatMap you 
simply create the opposite direction edges and in the reduceGroup you compute 
the neighborhood sets and degrees (size of the set) - weights.
    The result will be a dataset where each vertex has a `Tuple2` value with 
its "weight" as the first field and its neighbors as the second.
    
    Similarly, instead of using `getTriplets()` (which is convenient but quite 
expensive), you can compute the partial edge values with a single 
`groupReduceOnNeighbors`. 
    Say you have vertices `(1, {d1, (2, 3, 4)})`, `(2, {d2, (1, 3)})`, `(3, 
{d3, (2, 4)})` and `(4, {d4, (1, 3)})`. 
    In `groupReduceOnNeighbors`, vertex 1 will compute the following:
    - neighbor 2: common neighbor 3 -> emit `(2, 3, d1)`
    - neighbor 3: common neighbor 4 -> emit `(3, 4, d1)`
    
    Finally, you can `groupBy(0, 1)` this result dataset and compute the sums 
to get the final adamic-adar similarities.
    
    Let me know if this makes sense and whether you need more clarifications!


> Add an Adamic-Adar Similarity example
> -------------------------------------
>
>                 Key: FLINK-2310
>                 URL: https://issues.apache.org/jira/browse/FLINK-2310
>             Project: Flink
>          Issue Type: Task
>          Components: Gelly
>            Reporter: Andra Lungu
>            Assignee: Shivani Ghatge
>            Priority: Minor
>
> Just as Jaccard, the Adamic-Adar algorithm measures the similarity between a 
> set of nodes. However, instead of counting the common neighbors and dividing 
> them by the total number of neighbors, the similarity is weighted according 
> to the vertex degrees. In particular, it's equal to log(1/numberOfEdges).
> The Adamic-Adar algorithm can be broken into three steps: 
> 1). For each vertex, compute the log of its inverse degrees (with the formula 
> above) and set it as the vertex value. 
> 2). Each vertex will then send this new computed value along with a list of 
> neighbors to the targets of its out-edges
> 3). Weigh the edges with the Adamic-Adar index: Sum over n from CN of 
> log(1/k_n)(CN is the set of all common neighbors of two vertices x, y. k_n is 
> the degree of node n). See [2]
> Prerequisites: 
> - Full understanding of the Jaccard Similarity Measure algorithm
> - Reading the associated literature: 
> [1] http://social.cs.uiuc.edu/class/cs591kgk/friendsadamic.pdf
> [2] 
> http://stackoverflow.com/questions/22565620/fast-algorithm-to-compute-adamic-adar



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to