I think that it does not build a cartesian product of all nodes but just do
two complete scans of all the nodes O(N/2) each. And what is needed - just
2 quick index lookups to locate *node1* and *node2* and start searching for
shortest path between them.
And how fast neo4j will discover the path, depends mostly on RAM setup. I
am afraid that 8Gb of RAM (even with configuration fine-tuning) might be
not enough.
WBR,
Andrii
On Friday, November 21, 2014 3:07:50 PM UTC+2, Chris Vest wrote:
>
> I think this part of your query: (m {node:'%s'}), (n {node:'%s'}) might
> be spending a lot of time building up a cartesian product of all your
> nodes. Try inlining them into the path expression.
>
> Don’t give the JVM more heap memory than you have RAM, with a GB or two to
> spare for the operating system. If heap memory gets swapped out, then the
> GC pauses can get very, very long.
>
> --
> Chris Vest
> System Engineer, Neo Technology
> [ skype: mr.chrisvest, twitter: chvest ]
>
>
> On 20 Nov 2014, at 19:09, Erika Arnold <[email protected] <javascript:>>
> wrote:
>
> *tldr*
>
> My project involves Wikipedia's pagelinks dataset. When imported into in
> Neo4j, this results in a large directed graph with ~11m nodes and ~172m
> relationships. I want to efficiently find the shortest path between any two
> nodes in the graph. With my current query--and after tweaking with Java's
> memory settings--the query takes ~60 seconds to return a path. I would like
> feedback to decrease this response time.
>
> *details*
>
> My *setup* is a MacBook Air (1.3 GHz Intel Core i5, 4G 1600 MHz DDR3, OS
> X 10.9.5) with Neo4j (v. 2.1.5) and Java (v. 1.7.0_71) installed.
>
> Here's my github repo <https://github.com/erabug/wikigraph> (the readme
> contains more details for the following methods).
>
> I successfully batch imported my nodes.csv and rels.csv files into Neo4j.
> As I mentioned above, this produces a graph with ~11m nodes and ~172m
> relationships.
>
> The *data model* is simple: All Wikipedia pages are nodes with an id and
> title ('node', 'name') as well as a label for its category (all nodes are
> 'Pages', some have specific categories also, e.g. 'OfficeHolder'). There is
> only one relationship type, 'LINKS_TO', that describes which pages the node
> links to.
>
> graph structure: (Page) -[:LINKS_TO]-> (Page)
>
> Here is the *query* I use, via py2neo (v. 1.6.4) CypherQuery object:
>
> query = neo4j.CypherQuery(
> graph_db,
> """MATCH (m {node:'%s'}), (n {node:'%s'}), p =
> shortestPath((m)-[*..20]->(n)) RETURN p""" % (node1, node2)
> )
> path = query.execute_one()
>
> Auto-indexing (on 'node', e.g. id number) is turned on. Increases in
> *java.initmemory* and *java.maxmemory* had a dramatic effect on response
> time. At default settings for both (512MB), the shortest path was returned
> in ~27 minutes. At any setting higher than 4G (currently using 8192MB), the
> path is returned in ~60 seconds. I also tweaked settings in
> neo4j.properties, but saw no noticeable decreases.
>
> *logs*
>
> messages.log <https://gist.github.com/erabug/e2e683fbeae124804370>
>
> *what I've found from googling*
>
> Neo4j Cypher path finding slow in undirected graph
> <http://stackoverflow.com/questions/15456345/neo4j-cypher-path-finding-slow-in-undirected-graph>,
>
> Tuning neo4j for performance
> <http://stackoverflow.com/questions/17661902/tuning-neo4j-for-performance>,
> and Neo4j's Performance Guide
> <http://neo4j.com/docs/stable/performance-guide.html>. However, I'm not
> sure I know enough Java to try some of the suggestions on my own. If that's
> what is required to increase response time, I'm happy to learn, but I
> wanted to make sure it was the right approach first.
>
> *server experiment*
>
> I also deployed to an Amazon EC2 instance (t2.micro, 1G memory, 1vCU,
> ubuntu), just to experiment. I tried to change the same neo4j java settings
> there as I had on my local machine, but I could not run the neo4j server
> with anything other than the defaults. As a result, the query there takes
> ~22 mins.
>
> *feedback*
>
> I would love advice about the query, my settings, and things to try on the
> server (where I will ultimately want to house my project). Please let me
> know if I can provide any further information or clarification.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.