Hi, Yes a Hadoop Map/Reduce job could be very well suited to this kind of problem, Depending on the size of your database, you could load a zipped db directory from S3 to 20 ec2 launched instances with a bootstrap action. You could split the traversals into 5,000,000 Mappers using each node as a start node for the shortest path algo, and output the path lengths to 1 Reducer to compute the separation degree. I have some test code which uses the Elastic MapReduce SDK on github: https://github.com/paddydub/NeoHadoopTester
Depending on how your graph is structured, you could also try running a traversal from each node which updates an ArrayList with the path lengths to all other nodes in the graph? Cheers Paddy On Wed, May 11, 2011 at 12:12 PM, Peter Neubauer < peter.neuba...@neotechnology.com> wrote: > Hi JueiTing, > I think this is a typical case for a massive Map/Reduce job. I am thinking > of combining Hadoop works with replicas of the graph and then do the > computation. I believe Paddy Fitzgerald has been working with these > approaches and can give some feedback. > > Of course, given the size of the graph, that might prove a problem. OTOH, > if > there are no modifications during the computation, you could run the > calculations on read-only databases from the same store. Would that work? > > Cheers, > > /peter neubauer > > GTalk: neubauer.peter > Skype peter.neubauer > Phone +46 704 106975 > LinkedIn http://www.linkedin.com/in/neubauer > Twitter http://twitter.com/peterneubauer > > http://www.neo4j.org - Your high performance graph database. > http://startupbootcamp.org/ - Öresund - Innovation happens HERE. > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. > > > On Wed, May 11, 2011 at 12:47 AM, 翁瑞廷 <wshir...@gmail.com> wrote: > > > Hi, > > > > I'm trying to use Neo4j graph database to store a > > large social network(more than 5,000,000 nodes) for academic research. > > > > I need to compute the separation degree(path length) between any two > nodes > > in the graph then get the average degree of whole database. > > The solution I'm using use now is archieved by executing API > > "GraphAlgoFactory.shortestPath", > > but it means I need to execute (n*(n-1))/2 times to get all path length. > I > > don't think it's a very good idea :( > > > > So, I'm wondering that is there any function which can assign one node > and > > whole DB as Input, and return the > > paths or only path lengths between the node and all other nodes in the > > graph > > as Output? Just like Dijkstra algorithm did. > > I know there is a Dijkstra function in the API, but it only return the > path > > between two nodes. > > It'll be really helpful for me if there is any function can return all > the > > result in one computation(even the whole result array computing by > > Dijkstra) > > > > Or does anyone has better idea to deal with this issue? > > I tried to implement Dijkstra by myself, but it was impossible to declare > > such big 2D array in the program...(OutOfMemoryException) > > > > If there is any advice, I'll really appreciate it. > > > > Thanks. > > _______________________________________________ > > Neo4j mailing list > > User@lists.neo4j.org > > https://lists.neo4j.org/mailman/listinfo/user > > > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user