Patrice, I started off with a VM configuration similar to yourself. I found a considerable speed up by ditching the VM and running native on the Linux platform.
Wayne On Friday, 30 June 2017 21:43:12 UTC+1, Patrice Loos wrote: > I am testing a java query on different size dataset, 100 Million to 1 > Billion edges. > The query does not return much data 10 to 20 vertices with corresponding > edges but it need to scan the whole dataset. > I can see a big performances degradation when the database size is bigger > than 32 Gigs. > I am running the test on a 32 core 244G RAM virtual server, the query is > threaded to use all cpu. > I changed the java heap size to 96G and played with the garbage collector > options (retain -XX:+UseG1GC as the most improving option) > to get a better outcome but I still get big dip in performances, I assumed > the threshold is around 32G: > > 100M edges, database is 7.5G : 12 min > 250M edges, database is 19G : 35 min > 500M edges, database is 38G : 12 hours with -XX:+UseG1GC > 1B edges, database is 76G : 51 hours without -XX:+UseG1GC > > Furthermore for the 0.5 Billion and 1 Billion test I can see that the bulk > of the operations are system operations 60% versus > user operation 40% (from top linux command). When I run the smaller test > 100% of the operations are user operations. > > Are the java GC improvement in the Enterprise edition of Neo4j significant > enough to bring the performance of the large scale dataset query in the > same range as the smaller one? > Is there something else I can do to improve the performance of larger > dataset queries? > > tks > Patrice > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
