Re: [Neo4j] Possible memory leak when using MERGE queries (2.1.2)

Matt Aldridge Wed, 16 Jul 2014 09:54:28 -0700

Ah nice, thank you for the link!

-Matt



On Wednesday, July 16, 2014 12:06:54 PM UTC-4, Chris Vest wrote:
>
> Yeah, it’s a known bug: https://github.com/neo4j/neo4j/pull/2690
>
> --
> Chris Vest
> System Engineer, Neo Technology
> [ skype: mr.chrisvest, twitter: chvest ]
>
>  
> On 16 Jul 2014, at 16:51, Matt Aldridge <[email protected] 
> <javascript:>> wrote:
>
> Following up on this, my initial hypothesis of there being a problem with 
> MERGE queries appears incorrect, or at least overly specific. In another 
> application, I needed to add relationships among millions of nodes in a 
> pre-existing graph, essentially using cypher queries following the pattern 
> of "MATCH (a), (b) CREATE (a)-[r]-(b)". Running with a 10GB heap 
> (unnecessarily large I thought but just in case) I can make it through 
> about 16M queries before GC churn takes over and eventually jetty times out.
>
> To eliminate the possibility of any bugs in py2neo's transaction/HTTP 
> connection handling, I altered my code to remove the py2neo dependency and 
> communicate directly with Neo4j's begin/commit endpoint (
> http://localhost:7474/db/data/transaction/commit). ... Same result.
>
> I updated my test code gist (
> https://gist.github.com/mlaldrid/85a03fc022170561b807) similarly to 
> eliminate py2neo and use the begin/commit endpoint directly. Running on the 
> Neo4j community edition with a 512MB heap I get GC churn at about 350K 
> queries. I've downloaded a trial copy of the enterprise edition and run the 
> same test code with the same heap size (512MB) for an hour/3.5M queries 
> with no signs of memory leak or GC churn.
>
> Why is this memory leak behavior different in the community and enterprise 
> editions? Is it something that the enterprise edition's "advanced caching" 
> feature solves? Is it a known but opaque limitation of the community 
> edition?
>
> Thanks,
> -Matt
>
> On Monday, July 7, 2014 8:47:43 AM UTC-4, Matt Aldridge wrote:
>>
>> FWIW I have replicated this issue on 2.0.3 as well. While GC churn does 
>> kick in at approximately the same point as with 2.1.2, it is interesting to 
>> note how much faster the test case cypher queries perform in 
>> 2.1.2--something like 50% faster! :)
>>
>> Nonetheless, the memory leak does continue to be an issue for me. AFAICT, 
>> the py2neo API is properly opening, submitting, and closing the cypher 
>> transactions according to spec. I'd greatly appreciate any assistance in 
>> determining whether this is indeed a bug in Neo4j.
>>
>> Thanks,
>> -Matt
>>
>>
>> On Wednesday, July 2, 2014 2:20:10 PM UTC-4, Matt Aldridge wrote:
>>>
>>> Hi everyone,
>>>
>>> I have a use case that appears to expose a memory leak in Neo4j. I've 
>>> been testing this with Neo4j 2.1.2 on OSX.
>>>
>>> I've created a test case that reproduces the issue consistently and 
>>> mimics the behavior of my real-world application. 
>>> https://gist.github.com/mlaldrid/85a03fc022170561b807 This uses py2neo 
>>> to interface with Neo4j's Cypher transactional HTTP endpoint. To force the 
>>> suspected memory leak behavior to surface more quickly, I limit Neo4j's max 
>>> heap to 1GB. In practice I tend to use an 8GB heap, but the misbehavior 
>>> still occurs (albeit delayed).
>>>
>>> In my real-world application, we need to CREATE millions of primary 
>>> nodes of interest and MERGE ancillary nodes into the graph, as they can be 
>>> shared by any number of other primary nodes. In the test case here we give 
>>> the primary nodes the Person label, and the ancillary nodes are labeled 
>>> Address and Phone. A fixed set of Address and Phone nodes are generated and 
>>> randomly attached to Person nodes.
>>>
>>> Each Cypher transaction CREATEs 1000 Person nodes and MERGEs in 2 
>>> Address and 1 Phone node for each Person. The transactions are created and 
>>> then committed without any intermediate executions of the open transaction.
>>>
>>> This log demonstrates increasingly poor load performance until finally 
>>> Neo4j runs out of heap space and fails a transaction:
>>>
>>> % time python neo4j_heap_stress.py
>>> 2014-07-01 17:30:07,596 :: __main__ :: Generating fake data ...
>>> 2014-07-01 17:30:31,430 :: __main__ :: Creating label indices ...
>>> 2014-07-01 17:30:31,992 :: __main__ :: Beginning load ...
>>> 2014-07-01 17:33:33,949 :: __main__ :: Finished batch 100
>>> 2014-07-01 17:35:49,346 :: __main__ :: Finished batch 200
>>> 2014-07-01 17:37:56,856 :: __main__ :: Finished batch 300
>>> 2014-07-01 17:40:01,333 :: __main__ :: Finished batch 400
>>> 2014-07-01 17:42:04,855 :: __main__ :: Finished batch 500
>>> 2014-07-01 17:44:11,104 :: __main__ :: Finished batch 600
>>> 2014-07-01 17:46:17,261 :: __main__ :: Finished batch 700
>>> 2014-07-01 17:48:21,778 :: __main__ :: Finished batch 800
>>> 2014-07-01 17:50:28,206 :: __main__ :: Finished batch 900
>>> 2014-07-01 17:52:39,424 :: __main__ :: Finished batch 1000
>>> 2014-07-01 17:54:56,618 :: __main__ :: Finished batch 1100
>>> 2014-07-01 17:57:22,797 :: __main__ :: Finished batch 1200
>>> 2014-07-01 18:02:27,327 :: __main__ :: Finished batch 1300
>>> 2014-07-01 18:12:35,143 :: __main__ :: Finished batch 1400
>>> 2014-07-01 18:24:16,126 :: __main__ :: Finished batch 1500
>>> 2014-07-01 18:38:25,835 :: __main__ :: Finished batch 1600
>>> 2014-07-01 18:56:18,826 :: __main__ :: Finished batch 1700
>>> 2014-07-01 19:22:00,779 :: __main__ :: Finished batch 1800
>>> 2014-07-01 20:17:18,317 :: __main__ :: Finished batch 1900
>>> Traceback (most recent call last):
>>>   File "neo4j_heap_stress.py", line 112, in <module>
>>>     main()
>>>   File "neo4j_heap_stress.py", line 34, in main
>>>     load_batch(fake, addresses, phones)
>>>   File "neo4j_heap_stress.py", line 68, in load_batch
>>>     tx.commit()
>>>   File 
>>> "/Users/matt/.virtualenvs/dataenv/lib/python2.7/site-packages/py2neo/cypher.py",
>>>  
>>> line 242, in commit
>>>     return self._post(self._commit or self._begin_commit)
>>>   File 
>>> "/Users/matt/.virtualenvs/dataenv/lib/python2.7/site-packages/py2neo/cypher.py",
>>>  
>>> line 217, in _post
>>>     raise TransactionError.new(error["code"], error["message"])
>>> py2neo.cypher.CouldNotCommit: commit threw exception
>>> python neo4j_heap_stress.py  1351.47s user 48.76s system 10% cpu 
>>> 3:46:27.77 total
>>>
>>> Early in the load process, 100K Person nodes are inserted approximately 
>>> every 2 minutes. However, after 1.2M Person nodes (batch 1200) have been 
>>> inserted into the graph the batch times slow dramatically. This corresponds 
>>> with increased Java garbage collection times--near the end the GC suspends 
>>> application threads for several seconds at a time. I have captured the GC 
>>> logs for Neo4j if there is any interest in seeing that.
>>>
>>> Eventually a transaction fails, and this is in console.log:
>>>
>>> Exception in thread "GC-Monitor"
>>> Exception: java.lang.OutOfMemoryError thrown from the 
>>> UncaughtExceptionHandler in thread "GC-Monitor"
>>> Exception in thread "DateCache"
>>> Exception: java.lang.OutOfMemoryError thrown from the 
>>> UncaughtExceptionHandler in thread "DateCache"
>>> Exception in thread "RMI RenewClean-[192.168.1.214:60253]" 
>>> java.lang.OutOfMemoryError: Java heap space
>>>
>>> Of all the variables here, using many MERGE queries appears to me to be 
>>> the culprit behind the degradation in performance. Prior to changing my 
>>> application's graph model to use shared ancillary nodes among the primary 
>>> nodes I was just doing a simple CREATE for all nodes. That scaled nearly 
>>> linearly to tens of millions of nodes in the graph. That was also using the 
>>> same cypher transaction method via py2neo. Only when I wanted to 
>>> de-duplicate the shared ancillary nodes with MERGE did this behavior arise. 
>>> Hopefully that's not a red herring, but something is almost certainly amiss 
>>> here.
>>>
>>> Please let me know if I can provide any other relevant details to 
>>> reproduce.
>>>
>>> Thanks,
>>> -Matt
>>>
>>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Possible memory leak when using MERGE queries (2.1.2)

Reply via email to