Yang Yang created SPARK-10758: --------------------------------- Summary: approximation algorithms to speedup triangle count and clustering coefficient computation in GraphX Key: SPARK-10758 URL: https://issues.apache.org/jira/browse/SPARK-10758 Project: Spark Issue Type: New Feature Components: GraphX Reporter: Yang Yang
We propose to implement an algorithm to exactly count global clustering coefficient of a given graph, and two approximation algorithms to speedup the computation of triangle counting and clustering coefficient in GraphX. The basic idea of the approximation algorithm is: given a large-scale graph, we sample a subgraph, which is representative in the sense that graph properties of interest (e.g., #triangles) can be estimated with a known degree of accuracy. Our algorithm has been well tested on 16 network data comparing with several baseline methods (please see details in https://aminer.org/billboard/network-sampling). For example, when counting triangles on a Twitter network data (with 2,420,766 edges), the approximation algorithm can speedup 10x with only 0.36% error rate comparing with exact algorithm. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org