Hello all,

I hope this question is relevant to this community. Please let me know.

The question is along the lines of how do you avoid unexpected heap issues 
or garbage collection thrashing that causes 'timeouts' when handling large 
graphs.
The application I am trying to write depends on ~11M nodes with ~100M 
relationships with 2 or 3 properties.

In the application I will select a handful of nodes and find the connection 
paths between them and the expands sub graphs from the results. Conceivably 
the expansion of the sub graphs could return 100 of thousands or millions 
of nodes and even more relationships and associated properties.  I will run 
some analytics on sub graphs and then update the properties. I guess a 
fairly standard use model. (the machine running the DB has 16G ram) 

As I run trials on the queries I am seeing 'almost random' heap usage that 
every now and again causes 'out of heap' related errors. I understand the 
use of limits and batching to a reasonably level but feel that there should 
be a solid and consistent programmatic way to protect against heap 
problems. As the heap usage increase then I am seeing increasingly wider 
variations in query performance for repeat queries. 

Ideally I wouldn't have to artificially limit the size of the sub graphs 
that I work on as it erodes the performance of my analytic algorithms. In 
my application reliability is my #1 goal; consistent performance is my #2 
goal; Absolute performance a #3 goal.

So the question is: is there solid and consistent programmatic way to 
protect against heap problems for all classes of queries and large volumes 
of property updates?

Best regards, John.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to