[Neo4j] Re: Possible memory leak when using MERGE queries (2.1.2)

Matt Aldridge Wed, 16 Jul 2014 07:51:36 -0700

Following up on this, my initial hypothesis of there being a problem with 
MERGE queries appears incorrect, or at least overly specific. In another 
application, I needed to add relationships among millions of nodes in a 
pre-existing graph, essentially using cypher queries following the pattern 
of "MATCH (a), (b) CREATE (a)-[r]-(b)". Running with a 10GB heap 
(unnecessarily large I thought but just in case) I can make it through 
about 16M queries before GC churn takes over and eventually jetty times out.


To eliminate the possibility of any bugs in py2neo's transaction/HTTP 
connection handling, I altered my code to remove the py2neo dependency and 
communicate directly with Neo4j's begin/commit endpoint 
(http://localhost:7474/db/data/transaction/commit). ... Same result.

I updated my test code gist (
https://gist.github.com/mlaldrid/85a03fc022170561b807) similarly to 
eliminate py2neo and use the begin/commit endpoint directly. Running on the 
Neo4j community edition with a 512MB heap I get GC churn at about 350K 
queries. I've downloaded a trial copy of the enterprise edition and run the 
same test code with the same heap size (512MB) for an hour/3.5M queries 
with no signs of memory leak or GC churn.

Why is this memory leak behavior different in the community and enterprise 
editions? Is it something that the enterprise edition's "advanced caching" 
feature solves? Is it a known but opaque limitation of the community 
edition?

Thanks,
-Matt

On Monday, July 7, 2014 8:47:43 AM UTC-4, Matt Aldridge wrote:
>
> FWIW I have replicated this issue on 2.0.3 as well. While GC churn does 
> kick in at approximately the same point as with 2.1.2, it is interesting to 
> note how much faster the test case cypher queries perform in 
> 2.1.2--something like 50% faster! :)
>
> Nonetheless, the memory leak does continue to be an issue for me. AFAICT, 
> the py2neo API is properly opening, submitting, and closing the cypher 
> transactions according to spec. I'd greatly appreciate any assistance in 
> determining whether this is indeed a bug in Neo4j.
>
> Thanks,
> -Matt
>
>
> On Wednesday, July 2, 2014 2:20:10 PM UTC-4, Matt Aldridge wrote:
>>
>> Hi everyone,
>>
>> I have a use case that appears to expose a memory leak in Neo4j. I've 
>> been testing this with Neo4j 2.1.2 on OSX.
>>
>> I've created a test case that reproduces the issue consistently and 
>> mimics the behavior of my real-world application. 
>> https://gist.github.com/mlaldrid/85a03fc022170561b807 This uses py2neo 
>> to interface with Neo4j's Cypher transactional HTTP endpoint. To force the 
>> suspected memory leak behavior to surface more quickly, I limit Neo4j's max 
>> heap to 1GB. In practice I tend to use an 8GB heap, but the misbehavior 
>> still occurs (albeit delayed).
>>
>> In my real-world application, we need to CREATE millions of primary nodes 
>> of interest and MERGE ancillary nodes into the graph, as they can be shared 
>> by any number of other primary nodes. In the test case here we give the 
>> primary nodes the Person label, and the ancillary nodes are labeled Address 
>> and Phone. A fixed set of Address and Phone nodes are generated and 
>> randomly attached to Person nodes.
>>
>> Each Cypher transaction CREATEs 1000 Person nodes and MERGEs in 2 Address 
>> and 1 Phone node for each Person. The transactions are created and then 
>> committed without any intermediate executions of the open transaction.
>>
>> This log demonstrates increasingly poor load performance until finally 
>> Neo4j runs out of heap space and fails a transaction:
>>
>> % time python neo4j_heap_stress.py
>> 2014-07-01 17:30:07,596 :: __main__ :: Generating fake data ...
>> 2014-07-01 17:30:31,430 :: __main__ :: Creating label indices ...
>> 2014-07-01 17:30:31,992 :: __main__ :: Beginning load ...
>> 2014-07-01 17:33:33,949 :: __main__ :: Finished batch 100
>> 2014-07-01 17:35:49,346 :: __main__ :: Finished batch 200
>> 2014-07-01 17:37:56,856 :: __main__ :: Finished batch 300
>> 2014-07-01 17:40:01,333 :: __main__ :: Finished batch 400
>> 2014-07-01 17:42:04,855 :: __main__ :: Finished batch 500
>> 2014-07-01 17:44:11,104 :: __main__ :: Finished batch 600
>> 2014-07-01 17:46:17,261 :: __main__ :: Finished batch 700
>> 2014-07-01 17:48:21,778 :: __main__ :: Finished batch 800
>> 2014-07-01 17:50:28,206 :: __main__ :: Finished batch 900
>> 2014-07-01 17:52:39,424 :: __main__ :: Finished batch 1000
>> 2014-07-01 17:54:56,618 :: __main__ :: Finished batch 1100
>> 2014-07-01 17:57:22,797 :: __main__ :: Finished batch 1200
>> 2014-07-01 18:02:27,327 :: __main__ :: Finished batch 1300
>> 2014-07-01 18:12:35,143 :: __main__ :: Finished batch 1400
>> 2014-07-01 18:24:16,126 :: __main__ :: Finished batch 1500
>> 2014-07-01 18:38:25,835 :: __main__ :: Finished batch 1600
>> 2014-07-01 18:56:18,826 :: __main__ :: Finished batch 1700
>> 2014-07-01 19:22:00,779 :: __main__ :: Finished batch 1800
>> 2014-07-01 20:17:18,317 :: __main__ :: Finished batch 1900
>> Traceback (most recent call last):
>>   File "neo4j_heap_stress.py", line 112, in <module>
>>     main()
>>   File "neo4j_heap_stress.py", line 34, in main
>>     load_batch(fake, addresses, phones)
>>   File "neo4j_heap_stress.py", line 68, in load_batch
>>     tx.commit()
>>   File 
>> "/Users/matt/.virtualenvs/dataenv/lib/python2.7/site-packages/py2neo/cypher.py",
>>  
>> line 242, in commit
>>     return self._post(self._commit or self._begin_commit)
>>   File 
>> "/Users/matt/.virtualenvs/dataenv/lib/python2.7/site-packages/py2neo/cypher.py",
>>  
>> line 217, in _post
>>     raise TransactionError.new(error["code"], error["message"])
>> py2neo.cypher.CouldNotCommit: commit threw exception
>> python neo4j_heap_stress.py  1351.47s user 48.76s system 10% cpu 
>> 3:46:27.77 total
>>
>> Early in the load process, 100K Person nodes are inserted approximately 
>> every 2 minutes. However, after 1.2M Person nodes (batch 1200) have been 
>> inserted into the graph the batch times slow dramatically. This corresponds 
>> with increased Java garbage collection times--near the end the GC suspends 
>> application threads for several seconds at a time. I have captured the GC 
>> logs for Neo4j if there is any interest in seeing that.
>>
>> Eventually a transaction fails, and this is in console.log:
>>
>> Exception in thread "GC-Monitor"
>> Exception: java.lang.OutOfMemoryError thrown from the 
>> UncaughtExceptionHandler in thread "GC-Monitor"
>> Exception in thread "DateCache"
>> Exception: java.lang.OutOfMemoryError thrown from the 
>> UncaughtExceptionHandler in thread "DateCache"
>> Exception in thread "RMI RenewClean-[192.168.1.214:60253]" 
>> java.lang.OutOfMemoryError: Java heap space
>>
>> Of all the variables here, using many MERGE queries appears to me to be 
>> the culprit behind the degradation in performance. Prior to changing my 
>> application's graph model to use shared ancillary nodes among the primary 
>> nodes I was just doing a simple CREATE for all nodes. That scaled nearly 
>> linearly to tens of millions of nodes in the graph. That was also using the 
>> same cypher transaction method via py2neo. Only when I wanted to 
>> de-duplicate the shared ancillary nodes with MERGE did this behavior arise. 
>> Hopefully that's not a red herring, but something is almost certainly amiss 
>> here.
>>
>> Please let me know if I can provide any other relevant details to 
>> reproduce.
>>
>> Thanks,
>> -Matt
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[Neo4j] Re: Possible memory leak when using MERGE queries (2.1.2)

Reply via email to