Hi,
we are testing Neo4j as our storage backend, but so far, unable to achieve
desired performance. In fact, the neo4j's performance seems to be
exceptionally poor.
Is it the configuration, or suboptimal cypher query problem? This is what
I'v done so far to try and get cypher query to execute faster.
= Environment =
Linode host, 32G RAM, 12CPU Cores
Ubuntu 12.04
java version "1.7.0_55" OpenJDK Runtime Environment (IcedTea 2.4.7)
(7u55-2.4.7-1ubuntu1~0.12.04.2) OpenJDK 64-Bit Server VM (build 24.51-b03,
mixed mode)
Neo4j 2.1.2
= Neo4j configuration =
= Graph database =
(:User)-[:knows]->(:Word)-[:isin]->(:Text)
Nodes only has 'id' property which is integer.
Objects count
1 :User
12k :knows
13mil :Word
183mil :isin
3mil :Text
= Cypher query =
In general, we want to calculate various statistical distribution among
known, unknown words, texts, later on languages, etc. Thus, this is the
simplest ratio calculation we'v started with and which turned out to be a
wall of poor performance we cant climb over (time-wise).
MATCH (n:User {name: 'Foo'})-[r:knows]->(w:Word)-[i:isin]->(t:Text)
WITH t, count(i) as known_words_in_text
MATCH (t)<-[all_rels]-(v:Word)
WITH t, count(all_rels) as all_words_in_text, known_words_in_text
RETURN t, known_words_in_text, all_words_in_text,
known_words_in_text*100/all_words_in_text as RATIO
ORDER BY RATIO DESC LIMIT 10
= Results =
We'v tried different caches, GC and heap size combinations (actually permutated
them all) with best result so far of this query:
Returned 10 rows in 241087 ms
which left us wondering, what have we done wrong, or is this the limit of the
database performance?
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.