Hello everybody, we are Tillmann Fiehn and Sebastian Arnold, IT students from TU Berlin. As Sebastian Schelter already announced, we are atteding Isabel's and Sebastian's class "Large scale data analysis and data mining" and picked an interesting project that we want to implement in Mahout. We are open for any hints and suggestions and would appreciate if you could share your thoughts on our proposal.
Our goal is to implement a map/reduce algorithm for finding k-trusses in a given graph. A k-truss is a nontrivial, single-component maximal subgraph, such that every edge is contained in at least k-2 triangles in the subgraph. The algorithm was proposed in the IEEE paper J. Cohen 2009: "Graph Twiddling in a MapReduce World" and involves a number of graph algorithms that are to our knowledge currently not present in Mahout: - simplifyGraph: Edges->RepresentativeEdges remove Loops (not cycles) eliminate duplicate edges - augmentGraphWithDegrees: RepresentativeEdges->AugmentedEdges augment the edges with degree count for both nodes - enumerateTriangles: AugmentedEdges->Triangles find all triangles in a graph - findComponents: RepresentativeEdges->ZoneAssignments find all components of a graph, each identified as the order number of the lowest-order vertex contained includes: findAdjacentZones, mergeAdjacentZones - findKTrusses: RepresentativeEdges->ZoneAssignments(v,z) find all k-trusses of the graph each returned vertex v is part of a truss z Can you help us to identify sub-problems or data structures that may be already implemented in Mahout? Furthermore, we will try to implement all the graph algorithms as general as possible so they can be re-used for future development. We will create a JIRA issue with further details. We want to finish the implementation in 4-6 weeks. We are happy to recieve your feedback! Best regards, Tillmann & Sebastian -- *FG DIMA* Einsteinufer 17 / 705 10587 Berlin phone: +49 / 30 / 31425552
