Github user vasia commented on the pull request:
https://github.com/apache/flink/pull/892#issuecomment-123236061
Hi @shghatge,
let me try to explain the implementation in detail here :)
First of all, can you change this to a library method instead of an
example? The modification should be very easy. You only need to move the class
in the `org.apache.flink.graph.library` package and implement the
`GraphAlgorithm` interface. This way, users will be able to use this method by
simply calling `graph.run(new AdamicAdarSimilarity())`.
Making this a library method also means that we don't have to use only
Gelly methods, i.e. we can do a few things more efficiently.
For example, in the beginning of the algorithm, you need to compute (1) the
vertex "weights" and (2) the neighbor IDs for each vertex. Both these
computations can be done with a single GroupReduce on the edges dataset, i.e.
`edges.flatMap(...).groupBy(0).reduceGroup(...)`, where in the flatMap you
simply create the opposite direction edges and in the reduceGroup you compute
the neighborhood sets and degrees (size of the set) - weights.
The result will be a dataset where each vertex has a `Tuple2` value with
its "weight" as the first field and its neighbors as the second.
Similarly, instead of using `getTriplets()` (which is convenient but quite
expensive), you can compute the partial edge values with a single
`groupReduceOnNeighbors`.
Say you have vertices `(1, {d1, (2, 3, 4)})`, `(2, {d2, (1, 3)})`, `(3,
{d3, (2, 4)})` and `(4, {d4, (1, 3)})`.
In `groupReduceOnNeighbors`, vertex 1 will compute the following:
- neighbor 2: common neighbor 3 -> emit `(2, 3, d1)`
- neighbor 3: common neighbor 4 -> emit `(3, 4, d1)`
Finally, you can `groupBy(0, 1)` this result dataset and compute the sums
to get the final adamic-adar similarities.
Let me know if this makes sense and whether you need more clarifications!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---