Yeah, it's a known bug: https://github.com/neo4j/neo4j/pull/2690

--
Chris Vest
System Engineer, Neo Technology
[ skype: mr.chrisvest, twitter: chvest ]


On 16 Jul 2014, at 16:51, Matt Aldridge <[email protected]> wrote:

> Following up on this, my initial hypothesis of there being a problem with 
> MERGE queries appears incorrect, or at least overly specific. In another 
> application, I needed to add relationships among millions of nodes in a 
> pre-existing graph, essentially using cypher queries following the pattern of 
> "MATCH (a), (b) CREATE (a)-[r]-(b)". Running with a 10GB heap (unnecessarily 
> large I thought but just in case) I can make it through about 16M queries 
> before GC churn takes over and eventually jetty times out.
> 
> To eliminate the possibility of any bugs in py2neo's transaction/HTTP 
> connection handling, I altered my code to remove the py2neo dependency and 
> communicate directly with Neo4j's begin/commit endpoint 
> (http://localhost:7474/db/data/transaction/commit). ... Same result.
> 
> I updated my test code gist 
> (https://gist.github.com/mlaldrid/85a03fc022170561b807) similarly to 
> eliminate py2neo and use the begin/commit endpoint directly. Running on the 
> Neo4j community edition with a 512MB heap I get GC churn at about 350K 
> queries. I've downloaded a trial copy of the enterprise edition and run the 
> same test code with the same heap size (512MB) for an hour/3.5M queries with 
> no signs of memory leak or GC churn.
> 
> Why is this memory leak behavior different in the community and enterprise 
> editions? Is it something that the enterprise edition's "advanced caching" 
> feature solves? Is it a known but opaque limitation of the community edition?
> 
> Thanks,
> -Matt
> 
> On Monday, July 7, 2014 8:47:43 AM UTC-4, Matt Aldridge wrote:
> FWIW I have replicated this issue on 2.0.3 as well. While GC churn does kick 
> in at approximately the same point as with 2.1.2, it is interesting to note 
> how much faster the test case cypher queries perform in 2.1.2--something like 
> 50% faster! :)
> 
> Nonetheless, the memory leak does continue to be an issue for me. AFAICT, the 
> py2neo API is properly opening, submitting, and closing the cypher 
> transactions according to spec. I'd greatly appreciate any assistance in 
> determining whether this is indeed a bug in Neo4j.
> 
> Thanks,
> -Matt
> 
> 
> On Wednesday, July 2, 2014 2:20:10 PM UTC-4, Matt Aldridge wrote:
> Hi everyone,
> 
> I have a use case that appears to expose a memory leak in Neo4j. I've been 
> testing this with Neo4j 2.1.2 on OSX.
> 
> I've created a test case that reproduces the issue consistently and mimics 
> the behavior of my real-world application. 
> https://gist.github.com/mlaldrid/85a03fc022170561b807 This uses py2neo to 
> interface with Neo4j's Cypher transactional HTTP endpoint. To force the 
> suspected memory leak behavior to surface more quickly, I limit Neo4j's max 
> heap to 1GB. In practice I tend to use an 8GB heap, but the misbehavior still 
> occurs (albeit delayed).
> 
> In my real-world application, we need to CREATE millions of primary nodes of 
> interest and MERGE ancillary nodes into the graph, as they can be shared by 
> any number of other primary nodes. In the test case here we give the primary 
> nodes the Person label, and the ancillary nodes are labeled Address and 
> Phone. A fixed set of Address and Phone nodes are generated and randomly 
> attached to Person nodes.
> 
> Each Cypher transaction CREATEs 1000 Person nodes and MERGEs in 2 Address and 
> 1 Phone node for each Person. The transactions are created and then committed 
> without any intermediate executions of the open transaction.
> 
> This log demonstrates increasingly poor load performance until finally Neo4j 
> runs out of heap space and fails a transaction:
> 
> % time python neo4j_heap_stress.py
> 2014-07-01 17:30:07,596 :: __main__ :: Generating fake data ...
> 2014-07-01 17:30:31,430 :: __main__ :: Creating label indices ...
> 2014-07-01 17:30:31,992 :: __main__ :: Beginning load ...
> 2014-07-01 17:33:33,949 :: __main__ :: Finished batch 100
> 2014-07-01 17:35:49,346 :: __main__ :: Finished batch 200
> 2014-07-01 17:37:56,856 :: __main__ :: Finished batch 300
> 2014-07-01 17:40:01,333 :: __main__ :: Finished batch 400
> 2014-07-01 17:42:04,855 :: __main__ :: Finished batch 500
> 2014-07-01 17:44:11,104 :: __main__ :: Finished batch 600
> 2014-07-01 17:46:17,261 :: __main__ :: Finished batch 700
> 2014-07-01 17:48:21,778 :: __main__ :: Finished batch 800
> 2014-07-01 17:50:28,206 :: __main__ :: Finished batch 900
> 2014-07-01 17:52:39,424 :: __main__ :: Finished batch 1000
> 2014-07-01 17:54:56,618 :: __main__ :: Finished batch 1100
> 2014-07-01 17:57:22,797 :: __main__ :: Finished batch 1200
> 2014-07-01 18:02:27,327 :: __main__ :: Finished batch 1300
> 2014-07-01 18:12:35,143 :: __main__ :: Finished batch 1400
> 2014-07-01 18:24:16,126 :: __main__ :: Finished batch 1500
> 2014-07-01 18:38:25,835 :: __main__ :: Finished batch 1600
> 2014-07-01 18:56:18,826 :: __main__ :: Finished batch 1700
> 2014-07-01 19:22:00,779 :: __main__ :: Finished batch 1800
> 2014-07-01 20:17:18,317 :: __main__ :: Finished batch 1900
> Traceback (most recent call last):
>   File "neo4j_heap_stress.py", line 112, in <module>
>     main()
>   File "neo4j_heap_stress.py", line 34, in main
>     load_batch(fake, addresses, phones)
>   File "neo4j_heap_stress.py", line 68, in load_batch
>     tx.commit()
>   File 
> "/Users/matt/.virtualenvs/dataenv/lib/python2.7/site-packages/py2neo/cypher.py",
>  line 242, in commit
>     return self._post(self._commit or self._begin_commit)
>   File 
> "/Users/matt/.virtualenvs/dataenv/lib/python2.7/site-packages/py2neo/cypher.py",
>  line 217, in _post
>     raise TransactionError.new(error["code"], error["message"])
> py2neo.cypher.CouldNotCommit: commit threw exception
> python neo4j_heap_stress.py  1351.47s user 48.76s system 10% cpu 3:46:27.77 
> total
> 
> Early in the load process, 100K Person nodes are inserted approximately every 
> 2 minutes. However, after 1.2M Person nodes (batch 1200) have been inserted 
> into the graph the batch times slow dramatically. This corresponds with 
> increased Java garbage collection times--near the end the GC suspends 
> application threads for several seconds at a time. I have captured the GC 
> logs for Neo4j if there is any interest in seeing that.
> 
> Eventually a transaction fails, and this is in console.log:
> 
> Exception in thread "GC-Monitor"
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "GC-Monitor"
> Exception in thread "DateCache"
> Exception: java.lang.OutOfMemoryError thrown from the 
> UncaughtExceptionHandler in thread "DateCache"
> Exception in thread "RMI RenewClean-[192.168.1.214:60253]" 
> java.lang.OutOfMemoryError: Java heap space
> 
> Of all the variables here, using many MERGE queries appears to me to be the 
> culprit behind the degradation in performance. Prior to changing my 
> application's graph model to use shared ancillary nodes among the primary 
> nodes I was just doing a simple CREATE for all nodes. That scaled nearly 
> linearly to tens of millions of nodes in the graph. That was also using the 
> same cypher transaction method via py2neo. Only when I wanted to de-duplicate 
> the shared ancillary nodes with MERGE did this behavior arise. Hopefully 
> that's not a red herring, but something is almost certainly amiss here.
> 
> Please let me know if I can provide any other relevant details to reproduce.
> 
> Thanks,
> -Matt
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to