Yang Yang created SPARK-10758:
---------------------------------

             Summary: approximation algorithms to speedup triangle count and 
clustering coefficient computation in GraphX
                 Key: SPARK-10758
                 URL: https://issues.apache.org/jira/browse/SPARK-10758
             Project: Spark
          Issue Type: New Feature
          Components: GraphX
            Reporter: Yang Yang


We propose to implement an algorithm to exactly count global clustering 
coefficient of a given graph, and two approximation algorithms to speedup the 
computation of triangle counting and clustering coefficient in GraphX. 
The basic idea of the approximation algorithm is: given a large-scale graph, we 
sample a subgraph, which is representative in the sense that graph properties 
of interest (e.g., #triangles) can be estimated with a known degree of 
accuracy. 
Our algorithm has been well tested on 16 network data comparing with several 
baseline methods (please see details in 
https://aminer.org/billboard/network-sampling). For example, when counting 
triangles on a Twitter network data (with 2,420,766 edges), the approximation 
algorithm can speedup 10x with only 0.36% error rate comparing with exact 
algorithm. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to