Yeah, it's a known bug: https://github.com/neo4j/neo4j/pull/2690
-- Chris Vest System Engineer, Neo Technology [ skype: mr.chrisvest, twitter: chvest ] On 16 Jul 2014, at 16:51, Matt Aldridge <[email protected]> wrote: > Following up on this, my initial hypothesis of there being a problem with > MERGE queries appears incorrect, or at least overly specific. In another > application, I needed to add relationships among millions of nodes in a > pre-existing graph, essentially using cypher queries following the pattern of > "MATCH (a), (b) CREATE (a)-[r]-(b)". Running with a 10GB heap (unnecessarily > large I thought but just in case) I can make it through about 16M queries > before GC churn takes over and eventually jetty times out. > > To eliminate the possibility of any bugs in py2neo's transaction/HTTP > connection handling, I altered my code to remove the py2neo dependency and > communicate directly with Neo4j's begin/commit endpoint > (http://localhost:7474/db/data/transaction/commit). ... Same result. > > I updated my test code gist > (https://gist.github.com/mlaldrid/85a03fc022170561b807) similarly to > eliminate py2neo and use the begin/commit endpoint directly. Running on the > Neo4j community edition with a 512MB heap I get GC churn at about 350K > queries. I've downloaded a trial copy of the enterprise edition and run the > same test code with the same heap size (512MB) for an hour/3.5M queries with > no signs of memory leak or GC churn. > > Why is this memory leak behavior different in the community and enterprise > editions? Is it something that the enterprise edition's "advanced caching" > feature solves? Is it a known but opaque limitation of the community edition? > > Thanks, > -Matt > > On Monday, July 7, 2014 8:47:43 AM UTC-4, Matt Aldridge wrote: > FWIW I have replicated this issue on 2.0.3 as well. While GC churn does kick > in at approximately the same point as with 2.1.2, it is interesting to note > how much faster the test case cypher queries perform in 2.1.2--something like > 50% faster! :) > > Nonetheless, the memory leak does continue to be an issue for me. AFAICT, the > py2neo API is properly opening, submitting, and closing the cypher > transactions according to spec. I'd greatly appreciate any assistance in > determining whether this is indeed a bug in Neo4j. > > Thanks, > -Matt > > > On Wednesday, July 2, 2014 2:20:10 PM UTC-4, Matt Aldridge wrote: > Hi everyone, > > I have a use case that appears to expose a memory leak in Neo4j. I've been > testing this with Neo4j 2.1.2 on OSX. > > I've created a test case that reproduces the issue consistently and mimics > the behavior of my real-world application. > https://gist.github.com/mlaldrid/85a03fc022170561b807 This uses py2neo to > interface with Neo4j's Cypher transactional HTTP endpoint. To force the > suspected memory leak behavior to surface more quickly, I limit Neo4j's max > heap to 1GB. In practice I tend to use an 8GB heap, but the misbehavior still > occurs (albeit delayed). > > In my real-world application, we need to CREATE millions of primary nodes of > interest and MERGE ancillary nodes into the graph, as they can be shared by > any number of other primary nodes. In the test case here we give the primary > nodes the Person label, and the ancillary nodes are labeled Address and > Phone. A fixed set of Address and Phone nodes are generated and randomly > attached to Person nodes. > > Each Cypher transaction CREATEs 1000 Person nodes and MERGEs in 2 Address and > 1 Phone node for each Person. The transactions are created and then committed > without any intermediate executions of the open transaction. > > This log demonstrates increasingly poor load performance until finally Neo4j > runs out of heap space and fails a transaction: > > % time python neo4j_heap_stress.py > 2014-07-01 17:30:07,596 :: __main__ :: Generating fake data ... > 2014-07-01 17:30:31,430 :: __main__ :: Creating label indices ... > 2014-07-01 17:30:31,992 :: __main__ :: Beginning load ... > 2014-07-01 17:33:33,949 :: __main__ :: Finished batch 100 > 2014-07-01 17:35:49,346 :: __main__ :: Finished batch 200 > 2014-07-01 17:37:56,856 :: __main__ :: Finished batch 300 > 2014-07-01 17:40:01,333 :: __main__ :: Finished batch 400 > 2014-07-01 17:42:04,855 :: __main__ :: Finished batch 500 > 2014-07-01 17:44:11,104 :: __main__ :: Finished batch 600 > 2014-07-01 17:46:17,261 :: __main__ :: Finished batch 700 > 2014-07-01 17:48:21,778 :: __main__ :: Finished batch 800 > 2014-07-01 17:50:28,206 :: __main__ :: Finished batch 900 > 2014-07-01 17:52:39,424 :: __main__ :: Finished batch 1000 > 2014-07-01 17:54:56,618 :: __main__ :: Finished batch 1100 > 2014-07-01 17:57:22,797 :: __main__ :: Finished batch 1200 > 2014-07-01 18:02:27,327 :: __main__ :: Finished batch 1300 > 2014-07-01 18:12:35,143 :: __main__ :: Finished batch 1400 > 2014-07-01 18:24:16,126 :: __main__ :: Finished batch 1500 > 2014-07-01 18:38:25,835 :: __main__ :: Finished batch 1600 > 2014-07-01 18:56:18,826 :: __main__ :: Finished batch 1700 > 2014-07-01 19:22:00,779 :: __main__ :: Finished batch 1800 > 2014-07-01 20:17:18,317 :: __main__ :: Finished batch 1900 > Traceback (most recent call last): > File "neo4j_heap_stress.py", line 112, in <module> > main() > File "neo4j_heap_stress.py", line 34, in main > load_batch(fake, addresses, phones) > File "neo4j_heap_stress.py", line 68, in load_batch > tx.commit() > File > "/Users/matt/.virtualenvs/dataenv/lib/python2.7/site-packages/py2neo/cypher.py", > line 242, in commit > return self._post(self._commit or self._begin_commit) > File > "/Users/matt/.virtualenvs/dataenv/lib/python2.7/site-packages/py2neo/cypher.py", > line 217, in _post > raise TransactionError.new(error["code"], error["message"]) > py2neo.cypher.CouldNotCommit: commit threw exception > python neo4j_heap_stress.py 1351.47s user 48.76s system 10% cpu 3:46:27.77 > total > > Early in the load process, 100K Person nodes are inserted approximately every > 2 minutes. However, after 1.2M Person nodes (batch 1200) have been inserted > into the graph the batch times slow dramatically. This corresponds with > increased Java garbage collection times--near the end the GC suspends > application threads for several seconds at a time. I have captured the GC > logs for Neo4j if there is any interest in seeing that. > > Eventually a transaction fails, and this is in console.log: > > Exception in thread "GC-Monitor" > Exception: java.lang.OutOfMemoryError thrown from the > UncaughtExceptionHandler in thread "GC-Monitor" > Exception in thread "DateCache" > Exception: java.lang.OutOfMemoryError thrown from the > UncaughtExceptionHandler in thread "DateCache" > Exception in thread "RMI RenewClean-[192.168.1.214:60253]" > java.lang.OutOfMemoryError: Java heap space > > Of all the variables here, using many MERGE queries appears to me to be the > culprit behind the degradation in performance. Prior to changing my > application's graph model to use shared ancillary nodes among the primary > nodes I was just doing a simple CREATE for all nodes. That scaled nearly > linearly to tens of millions of nodes in the graph. That was also using the > same cypher transaction method via py2neo. Only when I wanted to de-duplicate > the shared ancillary nodes with MERGE did this behavior arise. Hopefully > that's not a red herring, but something is almost certainly amiss here. > > Please let me know if I can provide any other relevant details to reproduce. > > Thanks, > -Matt > > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
