Github user shghatge commented on the pull request:
https://github.com/apache/flink/pull/892#issuecomment-120036933
Hello @vasia
I would like to work on both versions of Adamic Adar. As the JIRA did not
ask for an approximate version, it was suggested that I create another JIRA
issue which will provide a library method for Adamic Adar which gives
approximate solution with the use of bloom filters.
I have a query about the bloom filters. Since bloom filters only tell us
whether an element belongs to the set or not, if both the vertices have Bloom
filters as value, how will we know what to emit? For Example for Vertex 3
'1,4,13' are set and for Vertex 5 '2,4,13' are set. Now when we use the method
suggested by you using logical AND we find out the intersection of the Bloom
Filters. After this do you suggest that we keep another hashtable that keeps
track of a value->vertex relation? Or do we just emit 5,4,1/log(d3) and keep
the hashtable as an identity map function? That would mean each vertex has n
number of bits as value , where n is the number of vertices in the graph. I
hope I was clear in my query. TL;DR We will have to use an identity hash
function which implies that each vertex will need n bits of memory as value. Is
it okay to use this much memory? If there is some other approach then please
let me know. Bloom filters seem to be more useful in finding size of the int
ersection or union but here we need to know which Vertices are common. The
only other way that I can roughly imagine is that we get the hashed edges in a
dataset, just like 5,4,1/(logd3)... Use the same hash function on all the graph
edges. Then Join the datasets obtained over field 1 and 2.
Please tell me if there is any other efficient way or which one of these
two you would prefer?
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---