Thanks for all your response,

Here is the size of the grapth db:

Nodes        Size
-----------------------------
100,000      97MB
200,000      182MB
300,000      267MB
...
5,000,000   expect 5GB

I've tried to use 5 virtual machines, each one has 2 cores and 1G memory,
Running 2 threads on each VM.

Nodes       Time
------------------------------
50,000      >20mins
150,000    >40hrs

Obviously, when the amount of node increases, the spending time of executing
GraphAlgoFactory.shortestPath increases heavily and non-linearly.
It's hard to estimate how many machines I need if I want to deal with
5,000,000 nodes or even 10,000,000....,
I think Map/Reduce will be one solution for me, and I'll try to use BFS
traversal which may reduce some duplicate procedures.


Nice to discuss with you,

Thanks.

2011/5/12 Michael Hunger <michael.hun...@neotechnology.com>

> Hey JueiTing,
>
> I'm not sure if Hadoop is needed here.
> What is the current performance characteristics for the shortest path you
> are using?
>
> You could take a decent machine and just fire up, e.g. blocks of 10k node
> pairs to a ThreadPoolExecutor with cores*2 threads.
> Each of those tasks only has to return the sum of the path lengths (and you
> know the block size) so you can sum the whole thing up onto a long and
> divide it by the
> number of pairs processed at the end?
>
> Perhaps instead of brute force looping one and a half times over all nodes
> it is perhaps better to do a breath first traversal over all nodes
> (regardless of relationships, just outgoing rels) and returning unique paths
> from the traverser
> and calculating the path lenghts from the start nodes to each _connected_
> node on the paths (start node -> path[0..n].nodes[1..n])
>
> Cheers
>
> Michael
>
> Am 11.05.2011 um 21:12 schrieb Peter Neubauer:
>
> > Hi JueiTing,
> > I think this is a typical case for a massive Map/Reduce job. I am
> thinking
> > of combining Hadoop works with replicas of the graph and then do the
> > computation. I believe Paddy Fitzgerald has been working with these
> > approaches and can give some feedback.
> >
> > Of course, given the size of the graph, that might prove a problem. OTOH,
> if
> > there are no modifications during the computation, you could run the
> > calculations on read-only databases from the same store. Would that work?
> >
> > Cheers,
> >
> > /peter neubauer
> >
> > GTalk:      neubauer.peter
> > Skype       peter.neubauer
> > Phone       +46 704 106975
> > LinkedIn   http://www.linkedin.com/in/neubauer
> > Twitter      http://twitter.com/peterneubauer
> >
> > http://www.neo4j.org               - Your high performance graph
> database.
> > http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
> > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
> >
> >
> > On Wed, May 11, 2011 at 12:47 AM, 翁瑞廷 <wshir...@gmail.com> wrote:
> >
> >> Hi,
> >>
> >> I'm trying to use Neo4j graph database to store a
> >> large social network(more than 5,000,000 nodes) for academic research.
> >>
> >> I need to compute the separation degree(path length) between any two
> nodes
> >> in the graph then get the average degree of whole database.
> >> The solution I'm using use now is archieved by executing API
> >> "GraphAlgoFactory.shortestPath",
> >> but it means I need to execute (n*(n-1))/2 times to get all path length.
> I
> >> don't think it's a very good idea :(
> >>
> >> So, I'm wondering that is there any function which can assign one node
> and
> >> whole DB as Input, and return the
> >> paths or only path lengths between the node and all other nodes in the
> >> graph
> >> as Output? Just like Dijkstra algorithm did.
> >> I know there is a Dijkstra function in the API, but it only return the
> path
> >> between two nodes.
> >> It'll be really helpful for me if there is any function can return all
> the
> >> result in one computation(even the whole result array computing by
> >> Dijkstra)
> >>
> >> Or does anyone has better idea to deal with this issue?
> >> I tried to implement Dijkstra by myself, but it was impossible to
> declare
> >> such big 2D array in the program...(OutOfMemoryException)
> >>
> >> If there is any advice, I'll really appreciate it.
> >>
> >> Thanks.
> >> _______________________________________________
> >> Neo4j mailing list
> >> User@lists.neo4j.org
> >> https://lists.neo4j.org/mailman/listinfo/user
> >>
> > _______________________________________________
> > Neo4j mailing list
> > User@lists.neo4j.org
> > https://lists.neo4j.org/mailman/listinfo/user
>
> _______________________________________________
> Neo4j mailing list
> User@lists.neo4j.org
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to