I am testing a java query on different size dataset, 100 Million to 1 
Billion edges. 
The query does not return much data 10 to 20 vertices with corresponding 
edges but it need to scan the whole dataset.
I can see a big performances degradation when the database size is bigger 
than 32 Gigs.
I am running the test on a 32 core 244G RAM virtual server, the query is 
threaded to use all cpu.
I changed the java heap size to 96G and played with the garbage collector 
options (retain -XX:+UseG1GC as the most improving option) 
to get a better outcome but I still get big dip in performances, I assumed 
the threshold is around 32G:

100M edges, database is 7.5G : 12 min
250M edges, database is 19G : 35 min
500M edges, database is 38G : 12 hours with -XX:+UseG1GC 
1B edges, database is 76G : 51 hours without -XX:+UseG1GC 

Furthermore for the 0.5 Billion and 1 Billion test I can see that the bulk 
of the operations are system operations 60% versus 
user operation 40% (from top linux command). When I run the smaller test 
100% of the operations are user operations.

Are the java GC improvement in the Enterprise edition of Neo4j significant 
enough to bring the performance of the large scale dataset query in the 
same range as the smaller one?
Is there something else I can do to improve the performance of larger 
dataset queries?

tks
Patrice

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to