I am testing a java query on different size dataset, 100 Million to 1 Billion edges. The query does not return much data 10 to 20 vertices with corresponding edges but it need to scan the whole dataset. I can see a big performances degradation when the database size is bigger than 32 Gigs. I am running the test on a 32 core 244G RAM virtual server, the query is threaded to use all cpu. I changed the java heap size to 96G and played with the garbage collector options (retain -XX:+UseG1GC as the most improving option) to get a better outcome but I still get big dip in performances, I assumed the threshold is around 32G:
100M edges, database is 7.5G : 12 min 250M edges, database is 19G : 35 min 500M edges, database is 38G : 12 hours with -XX:+UseG1GC 1B edges, database is 76G : 51 hours without -XX:+UseG1GC Furthermore for the 0.5 Billion and 1 Billion test I can see that the bulk of the operations are system operations 60% versus user operation 40% (from top linux command). When I run the smaller test 100% of the operations are user operations. Are the java GC improvement in the Enterprise edition of Neo4j significant enough to bring the performance of the large scale dataset query in the same range as the smaller one? Is there something else I can do to improve the performance of larger dataset queries? tks Patrice -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
