*What does your query look like?*

*How do you do this: "the query is threaded to use all cpu." ?*

If it has to scan the whole dataset, depending on your memory config it has
to first load the data into memory, where you measure the performance of
your IO.
If the database is larger than memory it has to discard data and reload it
again which affects this again massively.

Did you configure the page-cache in your neo4j.conf according to database
size? And set the heap to e.g. 16 or 32G ? Larger heaps shouldn't make a
difference.
*Page-Cache is what counts most*.

Which Neo4j version are you using? I recommend 3.2.1 Enterprise which comes
for instance with compiled cypher runtime.

Michael



On Fri, Jun 30, 2017 at 10:10 PM, Patrice Loos <pool...@gmail.com> wrote:

> I am testing a java query on different size dataset, 100 Million to 1
> Billion edges.
> The query does not return much data 10 to 20 vertices with corresponding
> edges but it need to scan the whole dataset.
> I can see a big performances degradation when the database size is bigger
> than 32 Gigs.
> I am running the test on a 32 core 244G RAM virtual server, the query is
> threaded to use all cpu.
> I changed the java heap size to 96G and played with the garbage collector
> options (retain -XX:+UseG1GC as the most improving option)
> to get a better outcome but I still get big dip in performances, I assumed
> the threshold is around 32G:
>
> 100M edges, database is 7.5G : 12 min
> 250M edges, database is 19G : 35 min
> 500M edges, database is 38G : 12 hours with -XX:+UseG1GC
> 1B edges, database is 76G : 51 hours without -XX:+UseG1GC
>
> Furthermore for the 0.5 Billion and 1 Billion test I can see that the bulk
> of the operations are system operations 60% versus
> user operation 40% (from top linux command). When I run the smaller test
> 100% of the operations are user operations.
>
> Are the java GC improvement in the Enterprise edition of Neo4j significant
> enough to bring the performance of the large scale dataset query in the
> same range as the smaller one?
> Is there something else I can do to improve the performance of larger
> dataset queries?
>
> tks
> Patrice
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to neo4j+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to neo4j+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to