Re: [Neo4j] Question about REST interface concurrency
Does your disk benchmark tests flush the data to disk or just write to it, making file system / OS flush when ever it feel like it (making it much faster, of course)? 2011/4/25 Stephen Roos sr...@careerarcgroup.com: Hi Jim, I took a look at my disk utilization and I'm only getting up to about 9379 KBps (write). My disk benchmarking tests show max write rates to be around 220 MBps, so I shouldn't be maxed out there. Interestingly, I don't see that much data in the graph.db directory (I see about 15 MB there after creating 150k empty nodes, no relationships, no index). The largest file is nioneo_logical.log.1 (14 MB), the next largest is the neostore.nodestore.db (1.3 MB). I don't know if that information is helpful, but I thought it was a bit strange that I'm sustaining disk write rates of 9 MBps for over 40 secs yet I don't have anywhere close to 9 * 40 MB of data. I do wonder about the flush operation though. Flush is a blocking operation, maybe that's the bottleneck even though the disk isn't over utilized. I'll look into that. Let me know if you have any other ideas. Thanks! Stephen -Original Message- From: Jim Webber [mailto:j...@neotechnology.com] Sent: Friday, April 22, 2011 3:34 AM To: Neo4j user discussions Subject: Re: [Neo4j] Question about REST interface concurrency Hi Stephen, I think the network IO you've measured is consistent with the rest of the behaviour your've described. What I'm thinking is that you're simply reaching the limits of create transaction-create a node-complete transaction-flush to filesystem (that is, you're basically testing disk write speed/seek time/etc). Can you check how busy your IO to disk is? I expect it'll be relatively high. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Question about REST interface concurrency
Hi Jim, From what I understand, it flushes with various granularities, though I'd suspect that it's not flushing after writes the size of empty nodes, so this is certainly a possible bottleneck point. I've been looking through the code and don't see exactly where the flush takes place. Can you point me at the right class? I did come across the PersistenceWindowPool class which seems to come into play when the underlying node record is updated during the transaction commit. It looks as if the windows are mapped over contiguous blocks of the primitives ID space and that because the new node IDs are typically sequential, each of my create-node operations is likely to target the same window. These windows are locked and waiting threads are queued up to wait for the locking thread to notify on unlock. Am I reading the code correctly? If so, do you have any thoughts on how we might remove that bottleneck? Thanks again for your help, Stephen -Original Message- From: Mattias Persson [mailto:matt...@neotechnology.com] Sent: Tuesday, April 26, 2011 12:19 AM To: Neo4j user discussions Subject: Re: [Neo4j] Question about REST interface concurrency Does your disk benchmark tests flush the data to disk or just write to it, making file system / OS flush when ever it feel like it (making it much faster, of course)? 2011/4/25 Stephen Roos sr...@careerarcgroup.com: Hi Jim, I took a look at my disk utilization and I'm only getting up to about 9379 KBps (write). My disk benchmarking tests show max write rates to be around 220 MBps, so I shouldn't be maxed out there. Interestingly, I don't see that much data in the graph.db directory (I see about 15 MB there after creating 150k empty nodes, no relationships, no index). The largest file is nioneo_logical.log.1 (14 MB), the next largest is the neostore.nodestore.db (1.3 MB). I don't know if that information is helpful, but I thought it was a bit strange that I'm sustaining disk write rates of 9 MBps for over 40 secs yet I don't have anywhere close to 9 * 40 MB of data. I do wonder about the flush operation though. Flush is a blocking operation, maybe that's the bottleneck even though the disk isn't over utilized. I'll look into that. Let me know if you have any other ideas. Thanks! Stephen -Original Message- From: Jim Webber [mailto:j...@neotechnology.com] Sent: Friday, April 22, 2011 3:34 AM To: Neo4j user discussions Subject: Re: [Neo4j] Question about REST interface concurrency Hi Stephen, I think the network IO you've measured is consistent with the rest of the behaviour your've described. What I'm thinking is that you're simply reaching the limits of create transaction-create a node-complete transaction-flush to filesystem (that is, you're basically testing disk write speed/seek time/etc). Can you check how busy your IO to disk is? I expect it'll be relatively high. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Question about REST interface concurrency
Hi Stephen, I think the network IO you've measured is consistent with the rest of the behaviour your've described. What I'm thinking is that you're simply reaching the limits of create transaction-create a node-complete transaction-flush to filesystem (that is, you're basically testing disk write speed/seek time/etc). Can you check how busy your IO to disk is? I expect it'll be relatively high. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Question about REST interface concurrency
Hi Peter, I'd be glad to share the code, I'll commit soon and share with the users list. I've run some more load/concurrency tests and am seeing some strange results. Maybe someone can help explain this to me: I run a load test where I fire off 100K create empty node REST requests to Neo as quickly as possible. With my code updates to allow configuration of the Jetty thread pool size, I can effectively reduce or increase the maximum concurrent transaction limit on the server. If I limit the thread pool so that there is only 1 thread available for requests, I see (as expected) the PeakNumberOfConcurrentTransactions reported by the Neo4j Transactions MBean is 1. If I scale the thread pool up so that there are 800 available request threads, I can throw enough load at the server to cause 800 concurrent transactions. From what I have read, node creation causes a node-local lock, not a global node lock, so there shouldn't be a lock-imposed concurrency bottleneck. The strange thing is, no matter whether I have 1 or 800 concurrent transactions, my total node creation throughput is always the same (~1600 nodes/sec). Even with 800 concurrent transactions, my server is only using ~15% CPU and ~25% memory (JVM Xmm/Xmx = 1024m/2048m), so server load wouldn't appear to be an issue. I've followed all the recommendations I could find including sysctl limits and JVM settings, but the rate doesn't change. I have also tried running the load test from multiple clients simultaneously (just to be sure I'm not running into any limits on the client machine), and indeed as soon as I add a second load test client, the throughput on each client gets cut in half. If I'm talking to Neo in a way that is unrestricted by things like thread pool size and concurrency limits, I'd expect to be able to scale up my load tests and see at least some level of throughput improvement until I start to saturate/overload the box. The fact that increasing concurrency doesn't increase throughput makes me think that there's some internal bottleneck or synchronization point that's limiting. Any thoughts? I'm glad to look through the code and investigate, any ideas you have would be a big help. Thanks, and sorry for the long question! Stephen -Original Message- From: Peter Neubauer [mailto:peter.neuba...@neotechnology.com] Sent: Monday, April 18, 2011 12:50 AM To: Neo4j user discussions Subject: Re: [Neo4j] Question about REST interface concurrency Stephen, did you fork the code? Would be good to merge in the changes or at least take a look at them! Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Mon, Apr 18, 2011 at 4:08 AM, Stephen Roos sr...@careerarcgroup.com wrote: Hi Jim, Thanks for the quick reply. I tried the configuration mentioned here (rest_max_jetty_threads): https://trac.neo4j.org/changeset/6157/laboratory/components/rest But it doesn't seem to have changed anything. I took a look through the code and didn't see any configuration settings exposed in Jetty6WebServer. I added the changes myself and am starting to see some good results (I've exposed settings for min/max threadpool size, # acceptor threads, acceptor queue size, and request buffer size). Is there anything else that you'd recommend tweaking to improve throughput? Thanks again for your help! -Original Message- From: Jim Webber [mailto:j...@neotechnology.com] Sent: Friday, April 15, 2011 1:57 AM To: Neo4j user discussions Subject: Re: [Neo4j] Question about REST interface concurrency Hi Stephen, The same Jetty tweaks that worked in previous versions will work with 1.3. We haven't changed any of the Jetty stuff under the covers. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Question about REST interface concurrency
I'm running on Linux (2.6.18). Watching network utilization, I never see rates higher than ~2.5 MBps on the server. I've also set net.core.rmem_min/max and net.ipv4.tcp_rmem/wmem in sysctl to be quite high based on some recommendations I've found. Is this contrary to your own load tests? Are you able to hit the server with enough load that the system is maxed out? I was considering adding some instrumentation around transactions so that I can see the average internal transaction time span during a load test. If you have any other thoughts on what to look for/test, I'd be very appreciative. Thanks again, Stephen -Original Message- From: Jim Webber [mailto:j...@neotechnology.com] Sent: Thursday, April 21, 2011 12:24 PM To: Neo4j user discussions Subject: Re: [Neo4j] Question about REST interface concurrency Hi Stephen, Are you running on Linux (or Windows) by any chance? I wonder whether the asymptotical performance you're seeing is because you've gotten to a point where you're exercising the IO channel and file system. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Question about REST interface concurrency
Stephen, did you fork the code? Would be good to merge in the changes or at least take a look at them! Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Mon, Apr 18, 2011 at 4:08 AM, Stephen Roos sr...@careerarcgroup.com wrote: Hi Jim, Thanks for the quick reply. I tried the configuration mentioned here (rest_max_jetty_threads): https://trac.neo4j.org/changeset/6157/laboratory/components/rest But it doesn't seem to have changed anything. I took a look through the code and didn't see any configuration settings exposed in Jetty6WebServer. I added the changes myself and am starting to see some good results (I've exposed settings for min/max threadpool size, # acceptor threads, acceptor queue size, and request buffer size). Is there anything else that you'd recommend tweaking to improve throughput? Thanks again for your help! -Original Message- From: Jim Webber [mailto:j...@neotechnology.com] Sent: Friday, April 15, 2011 1:57 AM To: Neo4j user discussions Subject: Re: [Neo4j] Question about REST interface concurrency Hi Stephen, The same Jetty tweaks that worked in previous versions will work with 1.3. We haven't changed any of the Jetty stuff under the covers. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Question about REST interface concurrency
Hi Jim, Thanks for the quick reply. I tried the configuration mentioned here (rest_max_jetty_threads): https://trac.neo4j.org/changeset/6157/laboratory/components/rest But it doesn't seem to have changed anything. I took a look through the code and didn't see any configuration settings exposed in Jetty6WebServer. I added the changes myself and am starting to see some good results (I've exposed settings for min/max threadpool size, # acceptor threads, acceptor queue size, and request buffer size). Is there anything else that you'd recommend tweaking to improve throughput? Thanks again for your help! -Original Message- From: Jim Webber [mailto:j...@neotechnology.com] Sent: Friday, April 15, 2011 1:57 AM To: Neo4j user discussions Subject: Re: [Neo4j] Question about REST interface concurrency Hi Stephen, The same Jetty tweaks that worked in previous versions will work with 1.3. We haven't changed any of the Jetty stuff under the covers. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Question about REST interface concurrency
Hi Stephen, The same Jetty tweaks that worked in previous versions will work with 1.3. We haven't changed any of the Jetty stuff under the covers. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Question about REST interface concurrency
Hello Neo Team! Congrats on the recent release! I'm using 1.3 enterprise in my development environment. I noticed that in earlier versions there were some patches to allow setting the min/max thread pool size for the REST servlet container. Are there any similar options now? Under load tests, it seems like I would benefit from at least having a higher initial thread pool size. Are there any other configuration changes or strategies that would help with overall throughput under heavy load? Thanks for your help! Stephen Roos Software Engineer CareerArc Group The Social Exceleration Network www.careerarcgroup.com This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error, please notify the sender and delete this email from your system. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user