[ https://issues.apache.org/jira/browse/MAHOUT-710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128243#comment-13128243 ]
Sebastian Schelter commented on MAHOUT-710: ------------------------------------------- Unfortunately it is already that commited code that (to my impression) does not fit the MapReduce paradigm and should be removed... > Implementing K-Trusses > ---------------------- > > Key: MAHOUT-710 > URL: https://issues.apache.org/jira/browse/MAHOUT-710 > Project: Mahout > Issue Type: New Feature > Components: Graph > Affects Versions: 0.5 > Reporter: Tillmann Fiehn > Assignee: Sebastian Schelter > Fix For: 0.6 > > Attachments: > 0001-SimplifyGraph_AugmentEdgesWithDegrees_EnumerateTriangles_workfine.patch, > 0002-FindComponents_FindKTrusses_workfine.patch > > Original Estimate: 672h > Remaining Estimate: 672h > > We are Tillmann Fiehn and Sebastian Arnold, IT students from TU Berlin. As > Sebastian Schelter already announced, we are atteding Isabel's and > Sebastian's class "Large scale data analysis and data mining" and picked an > interesting project that we want to implement in Mahout. We are open for any > hints and suggestions and would appreciate if you could share your thoughts > on our proposal. > Our goal is to implement a map/reduce algorithm for finding k-trusses in a > given graph. A k-truss is a nontrivial, single-component maximal subgraph, > such that every edge is contained in at least k-2 triangles in the subgraph. > The algorithm was proposed in the IEEE paper J. Cohen 2009: "Graph Twiddling > in a MapReduce World" > (http://www.csee.usf.edu/~anda/CIS6930-S11/papers/graph-processing-w-mapreduce.pdf) > and involves a number of graph algorithms that are to our knowledge > currently not present in Mahout: > *Goal: finding K-Trusses* > * relaxation of k-member clique > * non-trivial, single-component maximal subgraph, s.t. > * every edge is contained in at least k-2 triangles in the subgraph > *Algorithms to be implemented on top of Mahout / Hadoop:* > *simplifyGraph*: Edges -> RepresentativeEdges > * removes Loops (not cycles) > * aggregate duplicate edges > *augmentGraphWithDegrees*: RepresentativeEdges -> AugmentedEdges = (Edge (v, > u) , d(v), d(u)) > * augements the edges with degree information for both nodes d(v) = |{E | E = > (x,y) a. (x = v o. y = v) }| > *enumerateTriangles*: AugmentedEdges = (Edge, d(v), d(u)) -> Triangles (v, > u, s) > * finds all triangles in a Graph > *findComponents*: RepresentativeEdges -> ZoneAssignments (v, z) > * finds all components of a graph, each identified as the order number of the > lowest-order vertex contained > * consists of: > ** step 1: find adjacent zones: Edges x Zones -> InterzoneEdges (z, z) > ** step 2: merge adjacent zones into one (the lowest-order neighbouring > zone): InterzoneEdges, ZoneAssignments (v, z) -> Pairs (v, z) > {noformat} > while true do: > step 1 > if empty set interzone edges break; > step 2 > done > {noformat} > *findKTrusses*: Edges, k -> ZoneAssignments (v, z) > * finds all k-trusses of the graph > * each returned vertex v is part of a truss z > {noformat} > simplifyGraph > while true do: > augmentGraphWithDegrees > enumerateTriangles > keep only edges contained in k-2 triangles > if all edges kept break; > done > findComponents > {noformat} > We suppose to create the package {{org.apache.mahout.graph.trusses}} and > {{org.apache.mahout.graph.components}} in the {{core}} module. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira