Hi,
Yes a Hadoop Map/Reduce job could be very well suited to this kind of
problem,
Depending on the size of your database, you could load a zipped db directory
from S3 to 20 ec2 launched instances with a bootstrap action.
You could split the traversals into 5,000,000 Mappers using each node as a
start node for the shortest path algo,
and output the path lengths to 1 Reducer to compute the separation degree.
I have some test code which uses the Elastic MapReduce SDK on github:
https://github.com/paddydub/NeoHadoopTester


Depending on how your graph is structured, you could also try running a
traversal from each node which
updates an ArrayList with the path lengths to all other nodes in the graph?

Cheers
Paddy

On Wed, May 11, 2011 at 12:12 PM, Peter Neubauer <
peter.neuba...@neotechnology.com> wrote:

> Hi JueiTing,
> I think this is a typical case for a massive Map/Reduce job. I am thinking
> of combining Hadoop works with replicas of the graph and then do the
> computation. I believe Paddy Fitzgerald has been working with these
> approaches and can give some feedback.
>
> Of course, given the size of the graph, that might prove a problem. OTOH,
> if
> there are no modifications during the computation, you could run the
> calculations on read-only databases from the same store. Would that work?
>
> Cheers,
>
> /peter neubauer
>
> GTalk:      neubauer.peter
> Skype       peter.neubauer
> Phone       +46 704 106975
> LinkedIn   http://www.linkedin.com/in/neubauer
> Twitter      http://twitter.com/peterneubauer
>
> http://www.neo4j.org               - Your high performance graph database.
> http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>
>
> On Wed, May 11, 2011 at 12:47 AM, 翁瑞廷 <wshir...@gmail.com> wrote:
>
> >  Hi,
> >
> > I'm trying to use Neo4j graph database to store a
> > large social network(more than 5,000,000 nodes) for academic research.
> >
> > I need to compute the separation degree(path length) between any two
> nodes
> > in the graph then get the average degree of whole database.
> > The solution I'm using use now is archieved by executing API
> > "GraphAlgoFactory.shortestPath",
> > but it means I need to execute (n*(n-1))/2 times to get all path length.
> I
> > don't think it's a very good idea :(
> >
> > So, I'm wondering that is there any function which can assign one node
> and
> > whole DB as Input, and return the
> > paths or only path lengths between the node and all other nodes in the
> > graph
> > as Output? Just like Dijkstra algorithm did.
> > I know there is a Dijkstra function in the API, but it only return the
> path
> > between two nodes.
> > It'll be really helpful for me if there is any function can return all
> the
> > result in one computation(even the whole result array computing by
> > Dijkstra)
> >
> > Or does anyone has better idea to deal with this issue?
> > I tried to implement Dijkstra by myself, but it was impossible to declare
> > such big 2D array in the program...(OutOfMemoryException)
> >
> > If there is any advice, I'll really appreciate it.
> >
> > Thanks.
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
> >
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to