Github user andralungu commented on the pull request:
https://github.com/apache/flink/pull/1054#issuecomment-134702560
Hi @vasia,
I clarified the type of input expected. The graph should be undirected.
Without the distinct, you get duplicate edges there(and an erroneous number of
triangles). The second bullet point is again not an issue because the graph is
undirected.
The result should be fine. For the SNAP data sets, I got a number equal to
theirs on a cluster.
Concerning the runtime, you are right, It's just true for some cases
(generally faster by a factor of two) but it highly depends on the data set.
So, once this gets merged, I'll go ahead and propose the vertex centric version
as well. That way, the user can choose.
Hope I clarified everything!
Let me know if you still have questions :)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---