[
https://issues.apache.org/jira/browse/MAHOUT-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13051731#comment-13051731
]
Hector Yee commented on MAHOUT-710:
-----------------------------------
I'd love to beta test it soon. How will Enumerate Triangle perform on a graph
of tens of millions of nodes?
> Implementing K-Trusses
> ----------------------
>
> Key: MAHOUT-710
> URL: https://issues.apache.org/jira/browse/MAHOUT-710
> Project: Mahout
> Issue Type: New Feature
> Affects Versions: 0.6
> Reporter: Tillmann Fiehn
> Assignee: Sebastian Schelter
> Attachments:
> 0001-SimplifyGraph_AugmentEdgesWithDegrees_EnumerateTriangles_workfine.patch
>
> Original Estimate: 672h
> Remaining Estimate: 672h
>
> We are Tillmann Fiehn and Sebastian Arnold, IT students from TU Berlin. As
> Sebastian Schelter already announced, we are atteding Isabel's and
> Sebastian's class "Large scale data analysis and data mining" and picked an
> interesting project that we want to implement in Mahout. We are open for any
> hints and suggestions and would appreciate if you could share your thoughts
> on our proposal.
> Our goal is to implement a map/reduce algorithm for finding k-trusses in a
> given graph. A k-truss is a nontrivial, single-component maximal subgraph,
> such that every edge is contained in at least k-2 triangles in the subgraph.
> The algorithm was proposed in the IEEE paper J. Cohen 2009: "Graph Twiddling
> in a MapReduce World"
> (http://www.csee.usf.edu/~anda/CIS6930-S11/papers/graph-processing-w-mapreduce.pdf)
> and involves a number of graph algorithms that are to our knowledge
> currently not present in Mahout:
> *Goal: finding K-Trusses*
> * relaxation of k-member clique
> * non-trivial, single-component maximal subgraph, s.t.
> * every edge is contained in at least k-2 triangles in the subgraph
> *Algorithms to be implemented on top of Mahout / Hadoop:*
> *simplifyGraph*: Edges -> RepresentativeEdges
> * removes Loops (not cycles)
> * aggregate duplicate edges
> *augmentGraphWithDegrees*: RepresentativeEdges -> AugmentedEdges = (Edge (v,
> u) , d(v), d(u))
> * augements the edges with degree information for both nodes d(v) = |{E | E =
> (x,y) a. (x = v o. y = v) }|
> *enumerateTriangles*: AugmentedEdges = (Edge, d(v), d(u)) -> Triangles (v,
> u, s)
> * finds all triangles in a Graph
> *findComponents*: RepresentativeEdges -> ZoneAssignments (v, z)
> * finds all components of a graph, each identified as the order number of the
> lowest-order vertex contained
> * consists of:
> ** step 1: find adjacent zones: Edges x Zones -> InterzoneEdges (z, z)
> ** step 2: merge adjacent zones into one (the lowest-order neighbouring
> zone): InterzoneEdges, ZoneAssignments (v, z) -> Pairs (v, z)
> {noformat}
> while true do:
> step 1
> if empty set interzone edges break;
> step 2
> done
> {noformat}
> *findKTrusses*: Edges, k -> ZoneAssignments (v, z)
> * finds all k-trusses of the graph
> * each returned vertex v is part of a truss z
> {noformat}
> simplifyGraph
> while true do:
> augmentGraphWithDegrees
> enumerateTriangles
> keep only edges contained in k-2 triangles
> if all edges kept break;
> done
> findComponents
> {noformat}
> We suppose to create the package {{org.apache.mahout.graph.trusses}} and
> {{org.apache.mahout.graph.components}} in the {{core}} module.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira