Hi everyone, I have a use case that appears to expose a memory leak in Neo4j. I've been testing this with Neo4j 2.1.2 on OSX.
I've created a test case that reproduces the issue consistently and mimics the behavior of my real-world application. https://gist.github.com/mlaldrid/85a03fc022170561b807 This uses py2neo to interface with Neo4j's Cypher transactional HTTP endpoint. To force the suspected memory leak behavior to surface more quickly, I limit Neo4j's max heap to 1GB. In practice I tend to use an 8GB heap, but the misbehavior still occurs (albeit delayed). In my real-world application, we need to CREATE millions of primary nodes of interest and MERGE ancillary nodes into the graph, as they can be shared by any number of other primary nodes. In the test case here we give the primary nodes the Person label, and the ancillary nodes are labeled Address and Phone. A fixed set of Address and Phone nodes are generated and randomly attached to Person nodes. Each Cypher transaction CREATEs 1000 Person nodes and MERGEs in 2 Address and 1 Phone node for each Person. The transactions are created and then committed without any intermediate executions of the open transaction. This log demonstrates increasingly poor load performance until finally Neo4j runs out of heap space and fails a transaction: % time python neo4j_heap_stress.py 2014-07-01 17:30:07,596 :: __main__ :: Generating fake data ... 2014-07-01 17:30:31,430 :: __main__ :: Creating label indices ... 2014-07-01 17:30:31,992 :: __main__ :: Beginning load ... 2014-07-01 17:33:33,949 :: __main__ :: Finished batch 100 2014-07-01 17:35:49,346 :: __main__ :: Finished batch 200 2014-07-01 17:37:56,856 :: __main__ :: Finished batch 300 2014-07-01 17:40:01,333 :: __main__ :: Finished batch 400 2014-07-01 17:42:04,855 :: __main__ :: Finished batch 500 2014-07-01 17:44:11,104 :: __main__ :: Finished batch 600 2014-07-01 17:46:17,261 :: __main__ :: Finished batch 700 2014-07-01 17:48:21,778 :: __main__ :: Finished batch 800 2014-07-01 17:50:28,206 :: __main__ :: Finished batch 900 2014-07-01 17:52:39,424 :: __main__ :: Finished batch 1000 2014-07-01 17:54:56,618 :: __main__ :: Finished batch 1100 2014-07-01 17:57:22,797 :: __main__ :: Finished batch 1200 2014-07-01 18:02:27,327 :: __main__ :: Finished batch 1300 2014-07-01 18:12:35,143 :: __main__ :: Finished batch 1400 2014-07-01 18:24:16,126 :: __main__ :: Finished batch 1500 2014-07-01 18:38:25,835 :: __main__ :: Finished batch 1600 2014-07-01 18:56:18,826 :: __main__ :: Finished batch 1700 2014-07-01 19:22:00,779 :: __main__ :: Finished batch 1800 2014-07-01 20:17:18,317 :: __main__ :: Finished batch 1900 Traceback (most recent call last): File "neo4j_heap_stress.py", line 112, in <module> main() File "neo4j_heap_stress.py", line 34, in main load_batch(fake, addresses, phones) File "neo4j_heap_stress.py", line 68, in load_batch tx.commit() File "/Users/matt/.virtualenvs/dataenv/lib/python2.7/site-packages/py2neo/cypher.py", line 242, in commit return self._post(self._commit or self._begin_commit) File "/Users/matt/.virtualenvs/dataenv/lib/python2.7/site-packages/py2neo/cypher.py", line 217, in _post raise TransactionError.new(error["code"], error["message"]) py2neo.cypher.CouldNotCommit: commit threw exception python neo4j_heap_stress.py 1351.47s user 48.76s system 10% cpu 3:46:27.77 total Early in the load process, 100K Person nodes are inserted approximately every 2 minutes. However, after 1.2M Person nodes (batch 1200) have been inserted into the graph the batch times slow dramatically. This corresponds with increased Java garbage collection times--near the end the GC suspends application threads for several seconds at a time. I have captured the GC logs for Neo4j if there is any interest in seeing that. Eventually a transaction fails, and this is in console.log: Exception in thread "GC-Monitor" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "GC-Monitor" Exception in thread "DateCache" Exception: java.lang.OutOfMemoryError thrown from the UncaughtExceptionHandler in thread "DateCache" Exception in thread "RMI RenewClean-[192.168.1.214:60253]" java.lang.OutOfMemoryError: Java heap space Of all the variables here, using many MERGE queries appears to me to be the culprit behind the degradation in performance. Prior to changing my application's graph model to use shared ancillary nodes among the primary nodes I was just doing a simple CREATE for all nodes. That scaled nearly linearly to tens of millions of nodes in the graph. That was also using the same cypher transaction method via py2neo. Only when I wanted to de-duplicate the shared ancillary nodes with MERGE did this behavior arise. Hopefully that's not a red herring, but something is almost certainly amiss here. Please let me know if I can provide any other relevant details to reproduce. Thanks, -Matt -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
