Re: insert performance (1.2.8)
That sounds interesting, but not sure exactly what you mean? My key is like this: ((f1, f2, day) timeuuid), and f1/f2 are roughly well-distributed. So my inserts are pretty evenly distributed across about 22k combinations of f1+f2 each day. Are you saying that you get better performance by keeping the wide rows less wide, or by spreading out over time inserts into a single row? Just don't know what you mean by shuffling? On 08/26/2013 03:06 PM, Jake Luciani wrote: How are you inserting the data? Is it all partition at once? We've had the experience that shuffling the inserts across rows for wide rows gave us normal insert rates. When you mutate a entire wide row at once it hits a bottleneck. On Mon, Aug 26, 2013 at 4:49 PM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: I can believe that I'm IO bound with the current disk configuration, but that doesn't explain the CPU load does it? If I'm hitting a limit of disk performance, I should see a slowdown but not the jump in CPU, right? On 08/22/2013 11:52 AM, Nate McCall wrote: Given the backups in the flushing stages, I think you are IO bound. SSDs will work best for the data volume. Use rotational media for the commitlog as it is largely sequential. Quick experiment: disable commit log on the keyspace and see if your test goes faster (WITH DURABLE_WRITES = false on keyspace creation). On Wed, Aug 21, 2013 at 5:41 PM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: We have 2 partitions on the same physical disk for commit-log and data. Definitely non-optimal, we're planning to install SSDs for the commit-log partition but don't have them yet. Can this explain the high server loads? On 08/21/2013 04:24 PM, Nate McCall wrote:
Re: insert performance (1.2.8)
Building the giant batch string wasn't as bad as I thought, and at first I had great(!) results (using unlogged batches): 2500 rows/sec (batches of 100 in 48 threads) ran very smoothly, and the load on the cassandra server nodes averaged about 1.0 or less continuously. But then I upped it to 5000 rows/sec, and the load on the server nodes jumped to a continuous load on all 3 of 8-10 with peaks over 14. I also tried running 2 separate clients at 2500 rows/sec with the same results. I don't see any compactions while at this load, so would this likely be the result of GC thrashing? Seems like I'm spending a lot of effort and am still not getting very close to being able to insert 10k rows (10M of data each) per second, which is pretty disappointing. On 08/20/2013 07:16 PM, Nate McCall wrote: Thrift will allow for more large, free-form batch contstruction. The increase will be doing a lot more in the same payload message. Otherwise CQL is more efficient. If you do build those giant string, yes you should see a performance improvement. On Tue, Aug 20, 2013 at 8:03 PM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: Thanks. Can you tell me why would using thrift would improve performance? Also, if I do try to build those giant strings for a prepared batch statement, should I expect another performance improvement? On 08/20/2013 05:06 PM, Nate McCall wrote: Ugh - sorry, I knew Sylvain and Michaël had worked on this recently but it is only in 2.0 - I could have sworn it got marked for inclusion back into 1.2 but I was wrong: https://issues.apache.org/jira/browse/CASSANDRA-4693 This is indeed an issue if you don't know the column count before hand (or had a very large number of them like in your case). Again, apologies, I would not have recommended that route if I knew it was only in 2.0. I would be willing to bet you could hit those insert numbers pretty easily with thrift given the shape of your mutation. On Tue, Aug 20, 2013 at 5:00 PM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: So I tried inserting prepared statements separately (no batch), and my server nodes load definitely dropped significantly. Throughput from my client improved a bit, but only a few %. I was able to *almost* get 5000 rows/sec (sort of) by also reducing the rows/insert-thread to 20-50 and eliminating all overhead from the timing, i.e. timing only the tight for loop of inserts. But that's still a lot slower than I expected. I couldn't do batches because the driver doesn't allow prepared statements in a batch (QueryBuilder API). It appears the batch itself could possibly be a prepared statement, but since I have 40+ columns on each insert that would take some ugly code to build so I haven't tried it yet. I'm using CL ONE on the inserts and RF 2 in my schema. On 08/20/2013 08:04 AM, Nate McCall wrote: John makes a good point re:prepared statements (I'd increase batch sizes again once you did this as well - separate, incremental runs of course so you can gauge the effect of each). That should take out some of the processing overhead of statement validation in the server (some - that load spike still seems high though). I'd actually be really interested as to what your results were after doing so - i've not tried any A/B testing here for prepared statements on inserts. Given your load is on the server, i'm not sure adding more async indirection on the client would buy you too much though. Also, at what RF and consistency level are you writing? On Tue, Aug 20, 2013 at 8:56 AM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: Ok, I'll try prepared statements. But while sending my statements async might speed up my client, it wouldn't improve throughput on the cassandra nodes would it? They're running at pretty high loads and only about 10% idle, so my concern is that they can't handle the data any faster, so something's wrong on the server side. I don't really think there's anything on the client side that matters for this problem. Of course I know there are obvious h/w things I can do to improve server performance: SSDs, more RAM, more cores, etc. But I thought the servers I have would be able to handle more rows/sec than say Mysql, since write speed is supposed to be one of Cassandra's strengths. On 08/19/2013 09:03 PM, John Sanda wrote: I'd suggest using prepared statements that you initialize at application start up and switching to use
Re: insert performance (1.2.8)
The only thing I can think to suggest at this point is upping that batch size - say to 500 and see what happens. Do you have any monitoring on this cluster? If not, what do you see as the output of 'nodetool tpstats' while you run this test? On Wed, Aug 21, 2013 at 1:40 PM, Keith Freeman 8fo...@gmail.com wrote: Building the giant batch string wasn't as bad as I thought, and at first I had great(!) results (using unlogged batches): 2500 rows/sec (batches of 100 in 48 threads) ran very smoothly, and the load on the cassandra server nodes averaged about 1.0 or less continuously. But then I upped it to 5000 rows/sec, and the load on the server nodes jumped to a continuous load on all 3 of 8-10 with peaks over 14. I also tried running 2 separate clients at 2500 rows/sec with the same results. I don't see any compactions while at this load, so would this likely be the result of GC thrashing? Seems like I'm spending a lot of effort and am still not getting very close to being able to insert 10k rows (10M of data each) per second, which is pretty disappointing. On 08/20/2013 07:16 PM, Nate McCall wrote: Thrift will allow for more large, free-form batch contstruction. The increase will be doing a lot more in the same payload message. Otherwise CQL is more efficient. If you do build those giant string, yes you should see a performance improvement. On Tue, Aug 20, 2013 at 8:03 PM, Keith Freeman 8fo...@gmail.com wrote: Thanks. Can you tell me why would using thrift would improve performance? Also, if I do try to build those giant strings for a prepared batch statement, should I expect another performance improvement? On 08/20/2013 05:06 PM, Nate McCall wrote: Ugh - sorry, I knew Sylvain and Michaël had worked on this recently but it is only in 2.0 - I could have sworn it got marked for inclusion back into 1.2 but I was wrong: https://issues.apache.org/jira/browse/CASSANDRA-4693 This is indeed an issue if you don't know the column count before hand (or had a very large number of them like in your case). Again, apologies, I would not have recommended that route if I knew it was only in 2.0. I would be willing to bet you could hit those insert numbers pretty easily with thrift given the shape of your mutation. On Tue, Aug 20, 2013 at 5:00 PM, Keith Freeman 8fo...@gmail.com wrote: So I tried inserting prepared statements separately (no batch), and my server nodes load definitely dropped significantly. Throughput from my client improved a bit, but only a few %. I was able to *almost* get 5000 rows/sec (sort of) by also reducing the rows/insert-thread to 20-50 and eliminating all overhead from the timing, i.e. timing only the tight for loop of inserts. But that's still a lot slower than I expected. I couldn't do batches because the driver doesn't allow prepared statements in a batch (QueryBuilder API). It appears the batch itself could possibly be a prepared statement, but since I have 40+ columns on each insert that would take some ugly code to build so I haven't tried it yet. I'm using CL ONE on the inserts and RF 2 in my schema. On 08/20/2013 08:04 AM, Nate McCall wrote: John makes a good point re:prepared statements (I'd increase batch sizes again once you did this as well - separate, incremental runs of course so you can gauge the effect of each). That should take out some of the processing overhead of statement validation in the server (some - that load spike still seems high though). I'd actually be really interested as to what your results were after doing so - i've not tried any A/B testing here for prepared statements on inserts. Given your load is on the server, i'm not sure adding more async indirection on the client would buy you too much though. Also, at what RF and consistency level are you writing? On Tue, Aug 20, 2013 at 8:56 AM, Keith Freeman 8fo...@gmail.com wrote: Ok, I'll try prepared statements. But while sending my statements async might speed up my client, it wouldn't improve throughput on the cassandra nodes would it? They're running at pretty high loads and only about 10% idle, so my concern is that they can't handle the data any faster, so something's wrong on the server side. I don't really think there's anything on the client side that matters for this problem. Of course I know there are obvious h/w things I can do to improve server performance: SSDs, more RAM, more cores, etc. But I thought the servers I have would be able to handle more rows/sec than say Mysql, since write speed is supposed to be one of Cassandra's strengths. On 08/19/2013 09:03 PM, John Sanda wrote: I'd suggest using prepared statements that you initialize at application start up and switching to use Session.executeAsync coupled with Google Guava Futures API to get better throughput on the client side. On Mon, Aug 19, 2013 at 10:14 PM, Keith Freeman 8fo...@gmail.comwrote: Sure, I've tried
Re: insert performance (1.2.8)
What's the disk setup like on these system? You have some pending tasks in MemtablePostFlusher and FlushWriter which may mean there is contention on flushing discarded segments from the commit log. On Wed, Aug 21, 2013 at 5:14 PM, Keith Freeman 8fo...@gmail.com wrote: Ok, I tried batching 500 at a time, made no noticeable difference in the server loads. I have been monitoring JMX via jconsole if that's what you mean? I also did tpstats on all 3 nodes while it was under load (the 5000 rows/sec test). Attached file contains a screen shot of the JMX and the output of the 3 tpstats commands. On 08/21/2013 02:16 PM, Nate McCall wrote: The only thing I can think to suggest at this point is upping that batch size - say to 500 and see what happens. Do you have any monitoring on this cluster? If not, what do you see as the output of 'nodetool tpstats' while you run this test? On Wed, Aug 21, 2013 at 1:40 PM, Keith Freeman 8fo...@gmail.com wrote: Building the giant batch string wasn't as bad as I thought, and at first I had great(!) results (using unlogged batches): 2500 rows/sec (batches of 100 in 48 threads) ran very smoothly, and the load on the cassandra server nodes averaged about 1.0 or less continuously. But then I upped it to 5000 rows/sec, and the load on the server nodes jumped to a continuous load on all 3 of 8-10 with peaks over 14. I also tried running 2 separate clients at 2500 rows/sec with the same results. I don't see any compactions while at this load, so would this likely be the result of GC thrashing? Seems like I'm spending a lot of effort and am still not getting very close to being able to insert 10k rows (10M of data each) per second, which is pretty disappointing. On 08/20/2013 07:16 PM, Nate McCall wrote: Thrift will allow for more large, free-form batch contstruction. The increase will be doing a lot more in the same payload message. Otherwise CQL is more efficient. If you do build those giant string, yes you should see a performance improvement. On Tue, Aug 20, 2013 at 8:03 PM, Keith Freeman 8fo...@gmail.com wrote: Thanks. Can you tell me why would using thrift would improve performance? Also, if I do try to build those giant strings for a prepared batch statement, should I expect another performance improvement? On 08/20/2013 05:06 PM, Nate McCall wrote: Ugh - sorry, I knew Sylvain and Michaël had worked on this recently but it is only in 2.0 - I could have sworn it got marked for inclusion back into 1.2 but I was wrong: https://issues.apache.org/jira/browse/CASSANDRA-4693 This is indeed an issue if you don't know the column count before hand (or had a very large number of them like in your case). Again, apologies, I would not have recommended that route if I knew it was only in 2.0. I would be willing to bet you could hit those insert numbers pretty easily with thrift given the shape of your mutation. On Tue, Aug 20, 2013 at 5:00 PM, Keith Freeman 8fo...@gmail.com wrote: So I tried inserting prepared statements separately (no batch), and my server nodes load definitely dropped significantly. Throughput from my client improved a bit, but only a few %. I was able to *almost* get 5000 rows/sec (sort of) by also reducing the rows/insert-thread to 20-50 and eliminating all overhead from the timing, i.e. timing only the tight for loop of inserts. But that's still a lot slower than I expected. I couldn't do batches because the driver doesn't allow prepared statements in a batch (QueryBuilder API). It appears the batch itself could possibly be a prepared statement, but since I have 40+ columns on each insert that would take some ugly code to build so I haven't tried it yet. I'm using CL ONE on the inserts and RF 2 in my schema. On 08/20/2013 08:04 AM, Nate McCall wrote: John makes a good point re:prepared statements (I'd increase batch sizes again once you did this as well - separate, incremental runs of course so you can gauge the effect of each). That should take out some of the processing overhead of statement validation in the server (some - that load spike still seems high though). I'd actually be really interested as to what your results were after doing so - i've not tried any A/B testing here for prepared statements on inserts. Given your load is on the server, i'm not sure adding more async indirection on the client would buy you too much though. Also, at what RF and consistency level are you writing? On Tue, Aug 20, 2013 at 8:56 AM, Keith Freeman 8fo...@gmail.comwrote: Ok, I'll try prepared statements. But while sending my statements async might speed up my client, it wouldn't improve throughput on the cassandra nodes would it? They're running at pretty high loads and only about 10% idle, so my concern is that they can't handle the data any faster, so something's wrong on the server side. I don't really think there's anything on the client side
Re: insert performance (1.2.8)
Ok, I'll try prepared statements. But while sending my statements async might speed up my client, it wouldn't improve throughput on the cassandra nodes would it? They're running at pretty high loads and only about 10% idle, so my concern is that they can't handle the data any faster, so something's wrong on the server side. I don't really think there's anything on the client side that matters for this problem. Of course I know there are obvious h/w things I can do to improve server performance: SSDs, more RAM, more cores, etc. But I thought the servers I have would be able to handle more rows/sec than say Mysql, since write speed is supposed to be one of Cassandra's strengths. On 08/19/2013 09:03 PM, John Sanda wrote: I'd suggest using prepared statements that you initialize at application start up and switching to use Session.executeAsync coupled with Google Guava Futures API to get better throughput on the client side. On Mon, Aug 19, 2013 at 10:14 PM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: Sure, I've tried different numbers for batches and threads, but generally I'm running 10-30 threads at a time on the client, each sending a batch of 100 insert statements in every call, using the QueryBuilder.batch() API from the latest datastax java driver, then calling the Session.execute() function (synchronous) on the Batch. I can't post my code, but my client does this on each iteration: -- divides up the set of inserts by the number of threads -- stores the current time -- tells all the threads to send their inserts -- then when they've all returned checks the elapsed time At about 2000 rows for each iteration, 20 threads with 100 inserts each finish in about 1 second. For 4000 rows, 40 threads with 100 inserts each finish in about 1.5 - 2 seconds, and as I said all 3 cassandra nodes have a heavy CPU load while the client is hardly loaded. I've tried with 10 threads and more inserts per batch, or up to 60 threads with fewer, doesn't seem to make a lot of difference. On 08/19/2013 05:00 PM, Nate McCall wrote: How big are the batch sizes? In other words, how many rows are you sending per insert operation? Other than the above, not much else to suggest without seeing some example code (on pastebin, gist or similar, ideally). On Mon, Aug 19, 2013 at 5:49 PM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz machines not shared with any other VMs). I'm inserting time-series data into a single column-family using wide rows (timeuuids) and have a 3-part partition key so my primary key is something like ((a, b, day), in-time-uuid), x, y, z). My java client is feeding rows (about 1k of raw data size each) in batches using multiple threads, and the fastest I can get it run reliably is about 2000 rows/second. Even at that speed, all 3 cassandra nodes are very CPU bound, with loads of 6-9 each (and the client machine is hardly breaking a sweat). I've tried turning off compression in my table which reduced the loads slightly but not much. There are no other updates or reads occurring, except the datastax opscenter. I was expecting to be able to insert at least 10k rows/second with this configuration, and after a lot of reading of docs, blogs, and google, can't really figure out what's slowing my client down. When I increase the insert speed of my client beyond 2000/second, the server responses are just too slow and the client falls behind. I had a single-node Mysql database that can handle 10k of these data rows/second, so I really feel like I'm missing something in Cassandra. Any ideas? -- - John
Re: insert performance (1.2.8)
John makes a good point re:prepared statements (I'd increase batch sizes again once you did this as well - separate, incremental runs of course so you can gauge the effect of each). That should take out some of the processing overhead of statement validation in the server (some - that load spike still seems high though). I'd actually be really interested as to what your results were after doing so - i've not tried any A/B testing here for prepared statements on inserts. Given your load is on the server, i'm not sure adding more async indirection on the client would buy you too much though. Also, at what RF and consistency level are you writing? On Tue, Aug 20, 2013 at 8:56 AM, Keith Freeman 8fo...@gmail.com wrote: Ok, I'll try prepared statements. But while sending my statements async might speed up my client, it wouldn't improve throughput on the cassandra nodes would it? They're running at pretty high loads and only about 10% idle, so my concern is that they can't handle the data any faster, so something's wrong on the server side. I don't really think there's anything on the client side that matters for this problem. Of course I know there are obvious h/w things I can do to improve server performance: SSDs, more RAM, more cores, etc. But I thought the servers I have would be able to handle more rows/sec than say Mysql, since write speed is supposed to be one of Cassandra's strengths. On 08/19/2013 09:03 PM, John Sanda wrote: I'd suggest using prepared statements that you initialize at application start up and switching to use Session.executeAsync coupled with Google Guava Futures API to get better throughput on the client side. On Mon, Aug 19, 2013 at 10:14 PM, Keith Freeman 8fo...@gmail.com wrote: Sure, I've tried different numbers for batches and threads, but generally I'm running 10-30 threads at a time on the client, each sending a batch of 100 insert statements in every call, using the QueryBuilder.batch() API from the latest datastax java driver, then calling the Session.execute() function (synchronous) on the Batch. I can't post my code, but my client does this on each iteration: -- divides up the set of inserts by the number of threads -- stores the current time -- tells all the threads to send their inserts -- then when they've all returned checks the elapsed time At about 2000 rows for each iteration, 20 threads with 100 inserts each finish in about 1 second. For 4000 rows, 40 threads with 100 inserts each finish in about 1.5 - 2 seconds, and as I said all 3 cassandra nodes have a heavy CPU load while the client is hardly loaded. I've tried with 10 threads and more inserts per batch, or up to 60 threads with fewer, doesn't seem to make a lot of difference. On 08/19/2013 05:00 PM, Nate McCall wrote: How big are the batch sizes? In other words, how many rows are you sending per insert operation? Other than the above, not much else to suggest without seeing some example code (on pastebin, gist or similar, ideally). On Mon, Aug 19, 2013 at 5:49 PM, Keith Freeman 8fo...@gmail.com wrote: I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz machines not shared with any other VMs). I'm inserting time-series data into a single column-family using wide rows (timeuuids) and have a 3-part partition key so my primary key is something like ((a, b, day), in-time-uuid), x, y, z). My java client is feeding rows (about 1k of raw data size each) in batches using multiple threads, and the fastest I can get it run reliably is about 2000 rows/second. Even at that speed, all 3 cassandra nodes are very CPU bound, with loads of 6-9 each (and the client machine is hardly breaking a sweat). I've tried turning off compression in my table which reduced the loads slightly but not much. There are no other updates or reads occurring, except the datastax opscenter. I was expecting to be able to insert at least 10k rows/second with this configuration, and after a lot of reading of docs, blogs, and google, can't really figure out what's slowing my client down. When I increase the insert speed of my client beyond 2000/second, the server responses are just too slow and the client falls behind. I had a single-node Mysql database that can handle 10k of these data rows/second, so I really feel like I'm missing something in Cassandra. Any ideas? -- - John
Re: insert performance (1.2.8)
I had similar issues (sent a note on the list few weeks ago but nobody responded). I think there's a serious bottleneck with using wide rows and composite keys. I made a trivial benchmark, which you check here: http://pastebin.com/qAcRcqbF - it's written in cql-rb, but I ran the test using astyanax/cql3 enabled and the results were the same. In my case, inserting 10 000 entries took following time (seconds): Using composite keys Separetely: 12.892867 Batch: 189.731306 This means, I got 1000 rows/s when inserting them seperately and 52 (!!!) when inserting them in a huge batch. Using just partition key and wide row Separetely: 11.292507 Batch: 0.093355 Again, 1000 rows/s when inserting them one by one. But batch obviously improves thing and I easily got 1 rows/s. Anyone else with similar experiences? Thanks, Przemek On Tue, Aug 20, 2013 at 4:04 PM, Nate McCall n...@thelastpickle.com wrote: John makes a good point re:prepared statements (I'd increase batch sizes again once you did this as well - separate, incremental runs of course so you can gauge the effect of each). That should take out some of the processing overhead of statement validation in the server (some - that load spike still seems high though). I'd actually be really interested as to what your results were after doing so - i've not tried any A/B testing here for prepared statements on inserts. Given your load is on the server, i'm not sure adding more async indirection on the client would buy you too much though. Also, at what RF and consistency level are you writing? On Tue, Aug 20, 2013 at 8:56 AM, Keith Freeman 8fo...@gmail.com wrote: Ok, I'll try prepared statements. But while sending my statements async might speed up my client, it wouldn't improve throughput on the cassandra nodes would it? They're running at pretty high loads and only about 10% idle, so my concern is that they can't handle the data any faster, so something's wrong on the server side. I don't really think there's anything on the client side that matters for this problem. Of course I know there are obvious h/w things I can do to improve server performance: SSDs, more RAM, more cores, etc. But I thought the servers I have would be able to handle more rows/sec than say Mysql, since write speed is supposed to be one of Cassandra's strengths. On 08/19/2013 09:03 PM, John Sanda wrote: I'd suggest using prepared statements that you initialize at application start up and switching to use Session.executeAsync coupled with Google Guava Futures API to get better throughput on the client side. On Mon, Aug 19, 2013 at 10:14 PM, Keith Freeman 8fo...@gmail.com wrote: Sure, I've tried different numbers for batches and threads, but generally I'm running 10-30 threads at a time on the client, each sending a batch of 100 insert statements in every call, using the QueryBuilder.batch() API from the latest datastax java driver, then calling the Session.execute() function (synchronous) on the Batch. I can't post my code, but my client does this on each iteration: -- divides up the set of inserts by the number of threads -- stores the current time -- tells all the threads to send their inserts -- then when they've all returned checks the elapsed time At about 2000 rows for each iteration, 20 threads with 100 inserts each finish in about 1 second. For 4000 rows, 40 threads with 100 inserts each finish in about 1.5 - 2 seconds, and as I said all 3 cassandra nodes have a heavy CPU load while the client is hardly loaded. I've tried with 10 threads and more inserts per batch, or up to 60 threads with fewer, doesn't seem to make a lot of difference. On 08/19/2013 05:00 PM, Nate McCall wrote: How big are the batch sizes? In other words, how many rows are you sending per insert operation? Other than the above, not much else to suggest without seeing some example code (on pastebin, gist or similar, ideally). On Mon, Aug 19, 2013 at 5:49 PM, Keith Freeman 8fo...@gmail.com wrote: I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz machines not shared with any other VMs). I'm inserting time-series data into a single column-family using wide rows (timeuuids) and have a 3-part partition key so my primary key is something like ((a, b, day), in-time-uuid), x, y, z). My java client is feeding rows (about 1k of raw data size each) in batches using multiple threads, and the fastest I can get it run reliably is about 2000 rows/second. Even at that speed, all 3 cassandra nodes are very CPU bound, with loads of 6-9 each (and the client machine is hardly breaking a sweat). I've tried turning off compression in my table which reduced the loads slightly but not much. There are no other updates or reads occurring, except the datastax opscenter. I was expecting to be able to insert at least 10k rows/second with this configuration, and after a lot of reading of docs, blogs, and google, can't
Re: insert performance (1.2.8)
Thanks for putting this up - sorry I missed your post the other week. I would be real curious as to your results if you added a prepared statement for those inserts. On Tue, Aug 20, 2013 at 9:14 AM, Przemek Maciolek pmacio...@gmail.comwrote: I had similar issues (sent a note on the list few weeks ago but nobody responded). I think there's a serious bottleneck with using wide rows and composite keys. I made a trivial benchmark, which you check here: http://pastebin.com/qAcRcqbF - it's written in cql-rb, but I ran the test using astyanax/cql3 enabled and the results were the same. In my case, inserting 10 000 entries took following time (seconds): Using composite keys Separetely: 12.892867 Batch: 189.731306 This means, I got 1000 rows/s when inserting them seperately and 52 (!!!) when inserting them in a huge batch. Using just partition key and wide row Separetely: 11.292507 Batch: 0.093355 Again, 1000 rows/s when inserting them one by one. But batch obviously improves thing and I easily got 1 rows/s. Anyone else with similar experiences? Thanks, Przemek On Tue, Aug 20, 2013 at 4:04 PM, Nate McCall n...@thelastpickle.comwrote: John makes a good point re:prepared statements (I'd increase batch sizes again once you did this as well - separate, incremental runs of course so you can gauge the effect of each). That should take out some of the processing overhead of statement validation in the server (some - that load spike still seems high though). I'd actually be really interested as to what your results were after doing so - i've not tried any A/B testing here for prepared statements on inserts. Given your load is on the server, i'm not sure adding more async indirection on the client would buy you too much though. Also, at what RF and consistency level are you writing? On Tue, Aug 20, 2013 at 8:56 AM, Keith Freeman 8fo...@gmail.com wrote: Ok, I'll try prepared statements. But while sending my statements async might speed up my client, it wouldn't improve throughput on the cassandra nodes would it? They're running at pretty high loads and only about 10% idle, so my concern is that they can't handle the data any faster, so something's wrong on the server side. I don't really think there's anything on the client side that matters for this problem. Of course I know there are obvious h/w things I can do to improve server performance: SSDs, more RAM, more cores, etc. But I thought the servers I have would be able to handle more rows/sec than say Mysql, since write speed is supposed to be one of Cassandra's strengths. On 08/19/2013 09:03 PM, John Sanda wrote: I'd suggest using prepared statements that you initialize at application start up and switching to use Session.executeAsync coupled with Google Guava Futures API to get better throughput on the client side. On Mon, Aug 19, 2013 at 10:14 PM, Keith Freeman 8fo...@gmail.comwrote: Sure, I've tried different numbers for batches and threads, but generally I'm running 10-30 threads at a time on the client, each sending a batch of 100 insert statements in every call, using the QueryBuilder.batch() API from the latest datastax java driver, then calling the Session.execute() function (synchronous) on the Batch. I can't post my code, but my client does this on each iteration: -- divides up the set of inserts by the number of threads -- stores the current time -- tells all the threads to send their inserts -- then when they've all returned checks the elapsed time At about 2000 rows for each iteration, 20 threads with 100 inserts each finish in about 1 second. For 4000 rows, 40 threads with 100 inserts each finish in about 1.5 - 2 seconds, and as I said all 3 cassandra nodes have a heavy CPU load while the client is hardly loaded. I've tried with 10 threads and more inserts per batch, or up to 60 threads with fewer, doesn't seem to make a lot of difference. On 08/19/2013 05:00 PM, Nate McCall wrote: How big are the batch sizes? In other words, how many rows are you sending per insert operation? Other than the above, not much else to suggest without seeing some example code (on pastebin, gist or similar, ideally). On Mon, Aug 19, 2013 at 5:49 PM, Keith Freeman 8fo...@gmail.comwrote: I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz machines not shared with any other VMs). I'm inserting time-series data into a single column-family using wide rows (timeuuids) and have a 3-part partition key so my primary key is something like ((a, b, day), in-time-uuid), x, y, z). My java client is feeding rows (about 1k of raw data size each) in batches using multiple threads, and the fastest I can get it run reliably is about 2000 rows/second. Even at that speed, all 3 cassandra nodes are very CPU bound, with loads of 6-9 each (and the client machine is hardly breaking a sweat). I've tried turning off compression in my table which reduced
Re: insert performance (1.2.8)
AFAIK, batch prepared statements were added just recently: https://issues.apache.org/jira/browse/CASSANDRA-4693 and many client libraries are not supporting it yet. (And I believe that the problem is related to batch operations). On Tue, Aug 20, 2013 at 4:43 PM, Nate McCall n...@thelastpickle.com wrote: Thanks for putting this up - sorry I missed your post the other week. I would be real curious as to your results if you added a prepared statement for those inserts. On Tue, Aug 20, 2013 at 9:14 AM, Przemek Maciolek pmacio...@gmail.comwrote: I had similar issues (sent a note on the list few weeks ago but nobody responded). I think there's a serious bottleneck with using wide rows and composite keys. I made a trivial benchmark, which you check here: http://pastebin.com/qAcRcqbF - it's written in cql-rb, but I ran the test using astyanax/cql3 enabled and the results were the same. In my case, inserting 10 000 entries took following time (seconds): Using composite keys Separetely: 12.892867 Batch: 189.731306 This means, I got 1000 rows/s when inserting them seperately and 52 (!!!) when inserting them in a huge batch. Using just partition key and wide row Separetely: 11.292507 Batch: 0.093355 Again, 1000 rows/s when inserting them one by one. But batch obviously improves thing and I easily got 1 rows/s. Anyone else with similar experiences? Thanks, Przemek On Tue, Aug 20, 2013 at 4:04 PM, Nate McCall n...@thelastpickle.comwrote: John makes a good point re:prepared statements (I'd increase batch sizes again once you did this as well - separate, incremental runs of course so you can gauge the effect of each). That should take out some of the processing overhead of statement validation in the server (some - that load spike still seems high though). I'd actually be really interested as to what your results were after doing so - i've not tried any A/B testing here for prepared statements on inserts. Given your load is on the server, i'm not sure adding more async indirection on the client would buy you too much though. Also, at what RF and consistency level are you writing? On Tue, Aug 20, 2013 at 8:56 AM, Keith Freeman 8fo...@gmail.com wrote: Ok, I'll try prepared statements. But while sending my statements async might speed up my client, it wouldn't improve throughput on the cassandra nodes would it? They're running at pretty high loads and only about 10% idle, so my concern is that they can't handle the data any faster, so something's wrong on the server side. I don't really think there's anything on the client side that matters for this problem. Of course I know there are obvious h/w things I can do to improve server performance: SSDs, more RAM, more cores, etc. But I thought the servers I have would be able to handle more rows/sec than say Mysql, since write speed is supposed to be one of Cassandra's strengths. On 08/19/2013 09:03 PM, John Sanda wrote: I'd suggest using prepared statements that you initialize at application start up and switching to use Session.executeAsync coupled with Google Guava Futures API to get better throughput on the client side. On Mon, Aug 19, 2013 at 10:14 PM, Keith Freeman 8fo...@gmail.comwrote: Sure, I've tried different numbers for batches and threads, but generally I'm running 10-30 threads at a time on the client, each sending a batch of 100 insert statements in every call, using the QueryBuilder.batch() API from the latest datastax java driver, then calling the Session.execute() function (synchronous) on the Batch. I can't post my code, but my client does this on each iteration: -- divides up the set of inserts by the number of threads -- stores the current time -- tells all the threads to send their inserts -- then when they've all returned checks the elapsed time At about 2000 rows for each iteration, 20 threads with 100 inserts each finish in about 1 second. For 4000 rows, 40 threads with 100 inserts each finish in about 1.5 - 2 seconds, and as I said all 3 cassandra nodes have a heavy CPU load while the client is hardly loaded. I've tried with 10 threads and more inserts per batch, or up to 60 threads with fewer, doesn't seem to make a lot of difference. On 08/19/2013 05:00 PM, Nate McCall wrote: How big are the batch sizes? In other words, how many rows are you sending per insert operation? Other than the above, not much else to suggest without seeing some example code (on pastebin, gist or similar, ideally). On Mon, Aug 19, 2013 at 5:49 PM, Keith Freeman 8fo...@gmail.comwrote: I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz machines not shared with any other VMs). I'm inserting time-series data into a single column-family using wide rows (timeuuids) and have a 3-part partition key so my primary key is something like ((a, b, day), in-time-uuid), x, y, z). My java client is feeding rows (about 1k of raw data
Re: insert performance (1.2.8)
So I tried inserting prepared statements separately (no batch), and my server nodes load definitely dropped significantly. Throughput from my client improved a bit, but only a few %. I was able to *almost* get 5000 rows/sec (sort of) by also reducing the rows/insert-thread to 20-50 and eliminating all overhead from the timing, i.e. timing only the tight for loop of inserts. But that's still a lot slower than I expected. I couldn't do batches because the driver doesn't allow prepared statements in a batch (QueryBuilder API). It appears the batch itself could possibly be a prepared statement, but since I have 40+ columns on each insert that would take some ugly code to build so I haven't tried it yet. I'm using CL ONE on the inserts and RF 2 in my schema. On 08/20/2013 08:04 AM, Nate McCall wrote: John makes a good point re:prepared statements (I'd increase batch sizes again once you did this as well - separate, incremental runs of course so you can gauge the effect of each). That should take out some of the processing overhead of statement validation in the server (some - that load spike still seems high though). I'd actually be really interested as to what your results were after doing so - i've not tried any A/B testing here for prepared statements on inserts. Given your load is on the server, i'm not sure adding more async indirection on the client would buy you too much though. Also, at what RF and consistency level are you writing? On Tue, Aug 20, 2013 at 8:56 AM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: Ok, I'll try prepared statements. But while sending my statements async might speed up my client, it wouldn't improve throughput on the cassandra nodes would it? They're running at pretty high loads and only about 10% idle, so my concern is that they can't handle the data any faster, so something's wrong on the server side. I don't really think there's anything on the client side that matters for this problem. Of course I know there are obvious h/w things I can do to improve server performance: SSDs, more RAM, more cores, etc. But I thought the servers I have would be able to handle more rows/sec than say Mysql, since write speed is supposed to be one of Cassandra's strengths. On 08/19/2013 09:03 PM, John Sanda wrote: I'd suggest using prepared statements that you initialize at application start up and switching to use Session.executeAsync coupled with Google Guava Futures API to get better throughput on the client side. On Mon, Aug 19, 2013 at 10:14 PM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: Sure, I've tried different numbers for batches and threads, but generally I'm running 10-30 threads at a time on the client, each sending a batch of 100 insert statements in every call, using the QueryBuilder.batch() API from the latest datastax java driver, then calling the Session.execute() function (synchronous) on the Batch. I can't post my code, but my client does this on each iteration: -- divides up the set of inserts by the number of threads -- stores the current time -- tells all the threads to send their inserts -- then when they've all returned checks the elapsed time At about 2000 rows for each iteration, 20 threads with 100 inserts each finish in about 1 second. For 4000 rows, 40 threads with 100 inserts each finish in about 1.5 - 2 seconds, and as I said all 3 cassandra nodes have a heavy CPU load while the client is hardly loaded. I've tried with 10 threads and more inserts per batch, or up to 60 threads with fewer, doesn't seem to make a lot of difference. On 08/19/2013 05:00 PM, Nate McCall wrote: How big are the batch sizes? In other words, how many rows are you sending per insert operation? Other than the above, not much else to suggest without seeing some example code (on pastebin, gist or similar, ideally). On Mon, Aug 19, 2013 at 5:49 PM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz machines not shared with any other VMs). I'm inserting time-series data into a single column-family using wide rows (timeuuids) and have a 3-part partition key so my primary key is something like ((a, b, day), in-time-uuid), x, y, z). My java client is feeding rows (about 1k of raw data size each) in batches using multiple threads, and the fastest I can get it run reliably is about 2000 rows/second. Even at that speed, all 3 cassandra nodes are very CPU bound, with loads of 6-9 each (and the client
Re: insert performance (1.2.8)
Ugh - sorry, I knew Sylvain and Michaël had worked on this recently but it is only in 2.0 - I could have sworn it got marked for inclusion back into 1.2 but I was wrong: https://issues.apache.org/jira/browse/CASSANDRA-4693 This is indeed an issue if you don't know the column count before hand (or had a very large number of them like in your case). Again, apologies, I would not have recommended that route if I knew it was only in 2.0. I would be willing to bet you could hit those insert numbers pretty easily with thrift given the shape of your mutation. On Tue, Aug 20, 2013 at 5:00 PM, Keith Freeman 8fo...@gmail.com wrote: So I tried inserting prepared statements separately (no batch), and my server nodes load definitely dropped significantly. Throughput from my client improved a bit, but only a few %. I was able to *almost* get 5000 rows/sec (sort of) by also reducing the rows/insert-thread to 20-50 and eliminating all overhead from the timing, i.e. timing only the tight for loop of inserts. But that's still a lot slower than I expected. I couldn't do batches because the driver doesn't allow prepared statements in a batch (QueryBuilder API). It appears the batch itself could possibly be a prepared statement, but since I have 40+ columns on each insert that would take some ugly code to build so I haven't tried it yet. I'm using CL ONE on the inserts and RF 2 in my schema. On 08/20/2013 08:04 AM, Nate McCall wrote: John makes a good point re:prepared statements (I'd increase batch sizes again once you did this as well - separate, incremental runs of course so you can gauge the effect of each). That should take out some of the processing overhead of statement validation in the server (some - that load spike still seems high though). I'd actually be really interested as to what your results were after doing so - i've not tried any A/B testing here for prepared statements on inserts. Given your load is on the server, i'm not sure adding more async indirection on the client would buy you too much though. Also, at what RF and consistency level are you writing? On Tue, Aug 20, 2013 at 8:56 AM, Keith Freeman 8fo...@gmail.com wrote: Ok, I'll try prepared statements. But while sending my statements async might speed up my client, it wouldn't improve throughput on the cassandra nodes would it? They're running at pretty high loads and only about 10% idle, so my concern is that they can't handle the data any faster, so something's wrong on the server side. I don't really think there's anything on the client side that matters for this problem. Of course I know there are obvious h/w things I can do to improve server performance: SSDs, more RAM, more cores, etc. But I thought the servers I have would be able to handle more rows/sec than say Mysql, since write speed is supposed to be one of Cassandra's strengths. On 08/19/2013 09:03 PM, John Sanda wrote: I'd suggest using prepared statements that you initialize at application start up and switching to use Session.executeAsync coupled with Google Guava Futures API to get better throughput on the client side. On Mon, Aug 19, 2013 at 10:14 PM, Keith Freeman 8fo...@gmail.com wrote: Sure, I've tried different numbers for batches and threads, but generally I'm running 10-30 threads at a time on the client, each sending a batch of 100 insert statements in every call, using the QueryBuilder.batch() API from the latest datastax java driver, then calling the Session.execute() function (synchronous) on the Batch. I can't post my code, but my client does this on each iteration: -- divides up the set of inserts by the number of threads -- stores the current time -- tells all the threads to send their inserts -- then when they've all returned checks the elapsed time At about 2000 rows for each iteration, 20 threads with 100 inserts each finish in about 1 second. For 4000 rows, 40 threads with 100 inserts each finish in about 1.5 - 2 seconds, and as I said all 3 cassandra nodes have a heavy CPU load while the client is hardly loaded. I've tried with 10 threads and more inserts per batch, or up to 60 threads with fewer, doesn't seem to make a lot of difference. On 08/19/2013 05:00 PM, Nate McCall wrote: How big are the batch sizes? In other words, how many rows are you sending per insert operation? Other than the above, not much else to suggest without seeing some example code (on pastebin, gist or similar, ideally). On Mon, Aug 19, 2013 at 5:49 PM, Keith Freeman 8fo...@gmail.com wrote: I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz machines not shared with any other VMs). I'm inserting time-series data into a single column-family using wide rows (timeuuids) and have a 3-part partition key so my primary key is something like ((a, b, day), in-time-uuid), x, y, z). My java client is feeding rows (about 1k of raw data size each) in batches using
Re: insert performance (1.2.8)
Thanks. Can you tell me why would using thrift would improve performance? Also, if I do try to build those giant strings for a prepared batch statement, should I expect another performance improvement? On 08/20/2013 05:06 PM, Nate McCall wrote: Ugh - sorry, I knew Sylvain and Michaël had worked on this recently but it is only in 2.0 - I could have sworn it got marked for inclusion back into 1.2 but I was wrong: https://issues.apache.org/jira/browse/CASSANDRA-4693 This is indeed an issue if you don't know the column count before hand (or had a very large number of them like in your case). Again, apologies, I would not have recommended that route if I knew it was only in 2.0. I would be willing to bet you could hit those insert numbers pretty easily with thrift given the shape of your mutation. On Tue, Aug 20, 2013 at 5:00 PM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: So I tried inserting prepared statements separately (no batch), and my server nodes load definitely dropped significantly. Throughput from my client improved a bit, but only a few %. I was able to *almost* get 5000 rows/sec (sort of) by also reducing the rows/insert-thread to 20-50 and eliminating all overhead from the timing, i.e. timing only the tight for loop of inserts. But that's still a lot slower than I expected. I couldn't do batches because the driver doesn't allow prepared statements in a batch (QueryBuilder API). It appears the batch itself could possibly be a prepared statement, but since I have 40+ columns on each insert that would take some ugly code to build so I haven't tried it yet. I'm using CL ONE on the inserts and RF 2 in my schema. On 08/20/2013 08:04 AM, Nate McCall wrote: John makes a good point re:prepared statements (I'd increase batch sizes again once you did this as well - separate, incremental runs of course so you can gauge the effect of each). That should take out some of the processing overhead of statement validation in the server (some - that load spike still seems high though). I'd actually be really interested as to what your results were after doing so - i've not tried any A/B testing here for prepared statements on inserts. Given your load is on the server, i'm not sure adding more async indirection on the client would buy you too much though. Also, at what RF and consistency level are you writing? On Tue, Aug 20, 2013 at 8:56 AM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: Ok, I'll try prepared statements. But while sending my statements async might speed up my client, it wouldn't improve throughput on the cassandra nodes would it? They're running at pretty high loads and only about 10% idle, so my concern is that they can't handle the data any faster, so something's wrong on the server side. I don't really think there's anything on the client side that matters for this problem. Of course I know there are obvious h/w things I can do to improve server performance: SSDs, more RAM, more cores, etc. But I thought the servers I have would be able to handle more rows/sec than say Mysql, since write speed is supposed to be one of Cassandra's strengths. On 08/19/2013 09:03 PM, John Sanda wrote: I'd suggest using prepared statements that you initialize at application start up and switching to use Session.executeAsync coupled with Google Guava Futures API to get better throughput on the client side. On Mon, Aug 19, 2013 at 10:14 PM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: Sure, I've tried different numbers for batches and threads, but generally I'm running 10-30 threads at a time on the client, each sending a batch of 100 insert statements in every call, using the QueryBuilder.batch() API from the latest datastax java driver, then calling the Session.execute() function (synchronous) on the Batch. I can't post my code, but my client does this on each iteration: -- divides up the set of inserts by the number of threads -- stores the current time -- tells all the threads to send their inserts -- then when they've all returned checks the elapsed time At about 2000 rows for each iteration, 20 threads with 100 inserts each finish in about 1 second. For 4000 rows, 40 threads with 100 inserts each finish in about 1.5 - 2 seconds, and as I said all 3 cassandra nodes have a heavy CPU load while the client is hardly loaded. I've tried with 10 threads and more inserts per batch, or up to 60 threads with fewer, doesn't seem
Re: insert performance (1.2.8)
Thrift will allow for more large, free-form batch contstruction. The increase will be doing a lot more in the same payload message. Otherwise CQL is more efficient. If you do build those giant string, yes you should see a performance improvement. On Tue, Aug 20, 2013 at 8:03 PM, Keith Freeman 8fo...@gmail.com wrote: Thanks. Can you tell me why would using thrift would improve performance? Also, if I do try to build those giant strings for a prepared batch statement, should I expect another performance improvement? On 08/20/2013 05:06 PM, Nate McCall wrote: Ugh - sorry, I knew Sylvain and Michaël had worked on this recently but it is only in 2.0 - I could have sworn it got marked for inclusion back into 1.2 but I was wrong: https://issues.apache.org/jira/browse/CASSANDRA-4693 This is indeed an issue if you don't know the column count before hand (or had a very large number of them like in your case). Again, apologies, I would not have recommended that route if I knew it was only in 2.0. I would be willing to bet you could hit those insert numbers pretty easily with thrift given the shape of your mutation. On Tue, Aug 20, 2013 at 5:00 PM, Keith Freeman 8fo...@gmail.com wrote: So I tried inserting prepared statements separately (no batch), and my server nodes load definitely dropped significantly. Throughput from my client improved a bit, but only a few %. I was able to *almost* get 5000 rows/sec (sort of) by also reducing the rows/insert-thread to 20-50 and eliminating all overhead from the timing, i.e. timing only the tight for loop of inserts. But that's still a lot slower than I expected. I couldn't do batches because the driver doesn't allow prepared statements in a batch (QueryBuilder API). It appears the batch itself could possibly be a prepared statement, but since I have 40+ columns on each insert that would take some ugly code to build so I haven't tried it yet. I'm using CL ONE on the inserts and RF 2 in my schema. On 08/20/2013 08:04 AM, Nate McCall wrote: John makes a good point re:prepared statements (I'd increase batch sizes again once you did this as well - separate, incremental runs of course so you can gauge the effect of each). That should take out some of the processing overhead of statement validation in the server (some - that load spike still seems high though). I'd actually be really interested as to what your results were after doing so - i've not tried any A/B testing here for prepared statements on inserts. Given your load is on the server, i'm not sure adding more async indirection on the client would buy you too much though. Also, at what RF and consistency level are you writing? On Tue, Aug 20, 2013 at 8:56 AM, Keith Freeman 8fo...@gmail.com wrote: Ok, I'll try prepared statements. But while sending my statements async might speed up my client, it wouldn't improve throughput on the cassandra nodes would it? They're running at pretty high loads and only about 10% idle, so my concern is that they can't handle the data any faster, so something's wrong on the server side. I don't really think there's anything on the client side that matters for this problem. Of course I know there are obvious h/w things I can do to improve server performance: SSDs, more RAM, more cores, etc. But I thought the servers I have would be able to handle more rows/sec than say Mysql, since write speed is supposed to be one of Cassandra's strengths. On 08/19/2013 09:03 PM, John Sanda wrote: I'd suggest using prepared statements that you initialize at application start up and switching to use Session.executeAsync coupled with Google Guava Futures API to get better throughput on the client side. On Mon, Aug 19, 2013 at 10:14 PM, Keith Freeman 8fo...@gmail.comwrote: Sure, I've tried different numbers for batches and threads, but generally I'm running 10-30 threads at a time on the client, each sending a batch of 100 insert statements in every call, using the QueryBuilder.batch() API from the latest datastax java driver, then calling the Session.execute() function (synchronous) on the Batch. I can't post my code, but my client does this on each iteration: -- divides up the set of inserts by the number of threads -- stores the current time -- tells all the threads to send their inserts -- then when they've all returned checks the elapsed time At about 2000 rows for each iteration, 20 threads with 100 inserts each finish in about 1 second. For 4000 rows, 40 threads with 100 inserts each finish in about 1.5 - 2 seconds, and as I said all 3 cassandra nodes have a heavy CPU load while the client is hardly loaded. I've tried with 10 threads and more inserts per batch, or up to 60 threads with fewer, doesn't seem to make a lot of difference. On 08/19/2013 05:00 PM, Nate McCall wrote: How big are the batch sizes? In other words, how many rows are you sending per insert operation? Other
insert performance (1.2.8)
I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz machines not shared with any other VMs). I'm inserting time-series data into a single column-family using wide rows (timeuuids) and have a 3-part partition key so my primary key is something like ((a, b, day), in-time-uuid), x, y, z). My java client is feeding rows (about 1k of raw data size each) in batches using multiple threads, and the fastest I can get it run reliably is about 2000 rows/second. Even at that speed, all 3 cassandra nodes are very CPU bound, with loads of 6-9 each (and the client machine is hardly breaking a sweat). I've tried turning off compression in my table which reduced the loads slightly but not much. There are no other updates or reads occurring, except the datastax opscenter. I was expecting to be able to insert at least 10k rows/second with this configuration, and after a lot of reading of docs, blogs, and google, can't really figure out what's slowing my client down. When I increase the insert speed of my client beyond 2000/second, the server responses are just too slow and the client falls behind. I had a single-node Mysql database that can handle 10k of these data rows/second, so I really feel like I'm missing something in Cassandra. Any ideas?
Re: insert performance (1.2.8)
How big are the batch sizes? In other words, how many rows are you sending per insert operation? Other than the above, not much else to suggest without seeing some example code (on pastebin, gist or similar, ideally). On Mon, Aug 19, 2013 at 5:49 PM, Keith Freeman 8fo...@gmail.com wrote: I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz machines not shared with any other VMs). I'm inserting time-series data into a single column-family using wide rows (timeuuids) and have a 3-part partition key so my primary key is something like ((a, b, day), in-time-uuid), x, y, z). My java client is feeding rows (about 1k of raw data size each) in batches using multiple threads, and the fastest I can get it run reliably is about 2000 rows/second. Even at that speed, all 3 cassandra nodes are very CPU bound, with loads of 6-9 each (and the client machine is hardly breaking a sweat). I've tried turning off compression in my table which reduced the loads slightly but not much. There are no other updates or reads occurring, except the datastax opscenter. I was expecting to be able to insert at least 10k rows/second with this configuration, and after a lot of reading of docs, blogs, and google, can't really figure out what's slowing my client down. When I increase the insert speed of my client beyond 2000/second, the server responses are just too slow and the client falls behind. I had a single-node Mysql database that can handle 10k of these data rows/second, so I really feel like I'm missing something in Cassandra. Any ideas?
Re: insert performance (1.2.8)
Sure, I've tried different numbers for batches and threads, but generally I'm running 10-30 threads at a time on the client, each sending a batch of 100 insert statements in every call, using the QueryBuilder.batch() API from the latest datastax java driver, then calling the Session.execute() function (synchronous) on the Batch. I can't post my code, but my client does this on each iteration: -- divides up the set of inserts by the number of threads -- stores the current time -- tells all the threads to send their inserts -- then when they've all returned checks the elapsed time At about 2000 rows for each iteration, 20 threads with 100 inserts each finish in about 1 second. For 4000 rows, 40 threads with 100 inserts each finish in about 1.5 - 2 seconds, and as I said all 3 cassandra nodes have a heavy CPU load while the client is hardly loaded. I've tried with 10 threads and more inserts per batch, or up to 60 threads with fewer, doesn't seem to make a lot of difference. On 08/19/2013 05:00 PM, Nate McCall wrote: How big are the batch sizes? In other words, how many rows are you sending per insert operation? Other than the above, not much else to suggest without seeing some example code (on pastebin, gist or similar, ideally). On Mon, Aug 19, 2013 at 5:49 PM, Keith Freeman 8fo...@gmail.com mailto:8fo...@gmail.com wrote: I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz machines not shared with any other VMs). I'm inserting time-series data into a single column-family using wide rows (timeuuids) and have a 3-part partition key so my primary key is something like ((a, b, day), in-time-uuid), x, y, z). My java client is feeding rows (about 1k of raw data size each) in batches using multiple threads, and the fastest I can get it run reliably is about 2000 rows/second. Even at that speed, all 3 cassandra nodes are very CPU bound, with loads of 6-9 each (and the client machine is hardly breaking a sweat). I've tried turning off compression in my table which reduced the loads slightly but not much. There are no other updates or reads occurring, except the datastax opscenter. I was expecting to be able to insert at least 10k rows/second with this configuration, and after a lot of reading of docs, blogs, and google, can't really figure out what's slowing my client down. When I increase the insert speed of my client beyond 2000/second, the server responses are just too slow and the client falls behind. I had a single-node Mysql database that can handle 10k of these data rows/second, so I really feel like I'm missing something in Cassandra. Any ideas?
Re: insert performance (1.2.8)
I'd suggest using prepared statements that you initialize at application start up and switching to use Session.executeAsync coupled with Google Guava Futures API to get better throughput on the client side. On Mon, Aug 19, 2013 at 10:14 PM, Keith Freeman 8fo...@gmail.com wrote: Sure, I've tried different numbers for batches and threads, but generally I'm running 10-30 threads at a time on the client, each sending a batch of 100 insert statements in every call, using the QueryBuilder.batch() API from the latest datastax java driver, then calling the Session.execute() function (synchronous) on the Batch. I can't post my code, but my client does this on each iteration: -- divides up the set of inserts by the number of threads -- stores the current time -- tells all the threads to send their inserts -- then when they've all returned checks the elapsed time At about 2000 rows for each iteration, 20 threads with 100 inserts each finish in about 1 second. For 4000 rows, 40 threads with 100 inserts each finish in about 1.5 - 2 seconds, and as I said all 3 cassandra nodes have a heavy CPU load while the client is hardly loaded. I've tried with 10 threads and more inserts per batch, or up to 60 threads with fewer, doesn't seem to make a lot of difference. On 08/19/2013 05:00 PM, Nate McCall wrote: How big are the batch sizes? In other words, how many rows are you sending per insert operation? Other than the above, not much else to suggest without seeing some example code (on pastebin, gist or similar, ideally). On Mon, Aug 19, 2013 at 5:49 PM, Keith Freeman 8fo...@gmail.com wrote: I've got a 3-node cassandra cluster (16G/4-core VMs ESXi v5 on 2.5Ghz machines not shared with any other VMs). I'm inserting time-series data into a single column-family using wide rows (timeuuids) and have a 3-part partition key so my primary key is something like ((a, b, day), in-time-uuid), x, y, z). My java client is feeding rows (about 1k of raw data size each) in batches using multiple threads, and the fastest I can get it run reliably is about 2000 rows/second. Even at that speed, all 3 cassandra nodes are very CPU bound, with loads of 6-9 each (and the client machine is hardly breaking a sweat). I've tried turning off compression in my table which reduced the loads slightly but not much. There are no other updates or reads occurring, except the datastax opscenter. I was expecting to be able to insert at least 10k rows/second with this configuration, and after a lot of reading of docs, blogs, and google, can't really figure out what's slowing my client down. When I increase the insert speed of my client beyond 2000/second, the server responses are just too slow and the client falls behind. I had a single-node Mysql database that can handle 10k of these data rows/second, so I really feel like I'm missing something in Cassandra. Any ideas? -- - John
Re: insert performance
Definitely multi thread writes...probably with a little batching (10 or so). That's how i get my peak throughput. Le 23 févr. 2012 04:48, Deno Vichas d...@syncopated.net a écrit : all, would i be better off (i'm in java land) with spawning a bunch of threads that all add a single item to a mutator or a single thread that adds a bunch of items to a mutator? thanks, deno
insert performance
all, would i be better off (i'm in java land) with spawning a bunch of threads that all add a single item to a mutator or a single thread that adds a bunch of items to a mutator? thanks, deno
Re: Question about insert performance in multiple node cluster
Are your test client talks to single node or to both ?
Question about insert performance in multiple node cluster
Hi, We are trying to use Cassandra for high-performance insertion of simple key/value records. I have set up Cassandra on two of my machines in my local network (Windows 2008 server), using pretty much the default configuration. I created a test driver in java (using thrift) which inserts a single 1K data column (keys are unique strings of integer values) with multiple threads. On each machine I am able to achieve around 9,000 inserts/sec when running the test driver with the local Cassandra server. Then I set up a cluster with both machines, and ran the same test again (the test driver is still local to one of the Cassandra nodes). Surprisingly I did not see any improvement in the insert performance, I got the same 9000 inserts/sec as when running with a single node. I know that I shouldn't expect linear scaling to 18,000 operations/sec, but shouldn't I see at least some significant improvement? The CPU isn't fully loaded on either of the machines, and the network utilization is low too (1000 mbit network). Later on I also tested adding a third node, but that didn't improve anything either. I suspect I'm doing something wrong with setting up the cluster. The only changes I made on the second machine were: - AutoBootstrap=true - Setting 'Seed' to the IP of the other node Did I miss anything? Or am I simply wrong in expecting the throughput to scale when using multiple nodes? Thanks, Dirk
Re: Question about insert performance in multiple node cluster
On Mon, Feb 28, 2011 at 9:24 AM, Flachbart, Dirk (HP Software - TransactionVision) dirk.flachb...@hp.com wrote: Hi, We are trying to use Cassandra for high-performance insertion of simple key/value records. I have set up Cassandra on two of my machines in my local network (Windows 2008 server), using pretty much the default configuration. I created a test driver in java (using thrift) which inserts a single 1K data column (keys are unique strings of integer values) with multiple threads. On each machine I am able to achieve around 9,000 inserts/sec when running the test driver with the local Cassandra server. Then I set up a cluster with both machines, and ran the same test again (the test driver is still local to one of the Cassandra nodes). Surprisingly I did not see any improvement in the insert performance, I got the same 9000 inserts/sec as when running with a single node. I know that I shouldn’t expect linear scaling to 18,000 operations/sec, but shouldn’t I see at least some significant improvement? The CPU isn’t fully loaded on either of the machines, and the network utilization is low too (1000 mbit network). Later on I also tested adding a third node, but that didn’t improve anything either. I suspect I’m doing something wrong with setting up the cluster. The only changes I made on the second machine were: - AutoBootstrap=true - Setting ‘Seed’ to the IP of the other node Did I miss anything? Or am I simply wrong in expecting the throughput to scale when using multiple nodes? What's your replication factor? Which consistency level are you using? Is the ring evenly balanced? Did you double the number of client threads when you added the second server? -ryan
Re: Question about insert performance in multiple node cluster
What's your replication factor? Which consistency level are you using? Is the ring evenly balanced? Did you double the number of client threads when you added the second server? And are you on 100 mbit networking? 9k requests/second inserting 1k, sounds suspiciously close to saturating 100 MB of bandwidth. -- / Peter Schuller
RE: Question about insert performance in multiple node cluster
Nope, I'm on a Gigabit network. The windows task manager on both machines shows a network utilization of around 12 percent. Regards, Dirk -Original Message- From: sc...@scode.org [mailto:sc...@scode.org] On Behalf Of Peter Schuller Sent: Monday, February 28, 2011 12:53 PM To: user@cassandra.apache.org Cc: Ryan King; Flachbart, Dirk (HP Software - TransactionVision) Subject: Re: Question about insert performance in multiple node cluster What's your replication factor? Which consistency level are you using? Is the ring evenly balanced? Did you double the number of client threads when you added the second server? And are you on 100 mbit networking? 9k requests/second inserting 1k, sounds suspiciously close to saturating 100 MB of bandwidth. -- / Peter Schuller
Re: Question about insert performance in multiple node cluster
Replication factor is set to 1, and I'm using ConsistencyLevel.ANY. And yep, I tried doubling the threads from 16 to 32 when running with the second server, didn't make a difference. Are you sure the client isn't the bottleneck? Have you tried running the client on independent (and perhaps multiple) machines? What does nodetool tpstats say while you run the test? (Try running it several times in a row and observe how it changes.) Regarding the ring balancing - I assume it should be balanced. I'm using RandomPartitioner, and the keys are generated by simply incrementing an Integer counter value, so they should be spread fairly evenly across the two servers (at least that is my understanding based on the Wiki documentation). A 'nodetool compact' on all nodes (when not actively writing) followed by 'nodetool ring' should confirm that you're balanced across the nodes. -- / Peter Schuller
Re: Question about insert performance in multiple node cluster
On Mon, Feb 28, 2011 at 2:05 PM, Flachbart, Dirk (HP Software - TransactionVision) dirk.flachb...@hp.com wrote: Replication factor is set to 1, and I'm using ConsistencyLevel.ANY. And yep, I tried doubling the threads from 16 to 32 when running with the second server, didn't make a difference. Regarding the ring balancing - I assume it should be balanced. I'm using RandomPartitioner, and the keys are generated by simply incrementing an Integer counter value, so they should be spread fairly evenly across the two servers (at least that is my understanding based on the Wiki documentation). What does nodetool cfstats say? -ryan
0.6 insert performance .... Re: [RELEASE] 0.6.1
I wonder if anyone can use: * Add logging of GC activity (CASSANDRA-813) to confirm this: http://www.slideshare.net/schubertzhang/cassandra-060-insert-throughput - m. On Sun, Apr 18, 2010 at 6:58 PM, Eric Evans eev...@rackspace.com wrote: Hot on the trails of 0.6.0 comes our latest, 0.6.1. This stable point release contains a number of important bugfixes[1] and is a painless upgrade from 0.6.0. Enjoy! [1]: http://bit.ly/9NqwAb (changelog) -- Eric Evans eev...@rackspace.com
RE: 0.6 insert performance .... Re: [RELEASE] 0.6.1
I'm seeing some issues like this as well, in fact, I think seeing your graphs has helped me understand the dynamics of my cluster better. Using some ballpark figures for inserting single column objects of ~500 bytes onto individual nodes(not when combined as a cluster): Node1: Inserts 12000/s Node2: Inserts 12000/s Node3: Inserts 9000/s Node4: Inserts 6000/s When combined as a cluster, inserts are around 7000/s (replication factor of 2) When GC kicks in anywhere in the cluster, Quorum writes slowdown for everyone associated with that node. And the fact that there are 4 Nodes, almost implies garbage collection will be going on somewhere almost all the time. So while I should be able to write more than 12,000/second, my slowest node in the cluster seems to overwhelm the faster nodes and drag everyone down. I'm still running tests of various combinations to see where things work out. From: Masood Mortazavi [mailto:masoodmortaz...@gmail.com] Sent: Monday, April 19, 2010 6:15 AM To: user@cassandra.apache.org; d...@cassandra.apache.org Subject: 0.6 insert performance Re: [RELEASE] 0.6.1 I wonder if anyone can use: * Add logging of GC activity (CASSANDRA-813) to confirm this: http://www.slideshare.net/schubertzhang/cassandra-060-insert-throughput - m. On Sun, Apr 18, 2010 at 6:58 PM, Eric Evans eev...@rackspace.commailto:eev...@rackspace.com wrote: Hot on the trails of 0.6.0 comes our latest, 0.6.1. This stable point release contains a number of important bugfixes[1] and is a painless upgrade from 0.6.0. Enjoy! [1]: http://bit.ly/9NqwAb (changelog) -- Eric Evans eev...@rackspace.commailto:eev...@rackspace.com
RE: 0.6 insert performance .... Re: [RELEASE] 0.6.1
We see this behavior as well with 0.6, heap usage graphs look almost identical. The GC is a noticeable bottleneck, we've tried jdku19 and jrockit vm's. It basically kills any kind of soft real time behavior. From: Masood Mortazavi [mailto:masoodmortaz...@gmail.com] Sent: Monday, April 19, 2010 4:15 AM To: user@cassandra.apache.org; d...@cassandra.apache.org Subject: 0.6 insert performance Re: [RELEASE] 0.6.1 I wonder if anyone can use: * Add logging of GC activity (CASSANDRA-813) to confirm this: http://www.slideshare.net/schubertzhang/cassandra-060-insert-throughput - m. On Sun, Apr 18, 2010 at 6:58 PM, Eric Evans eev...@rackspace.commailto:eev...@rackspace.com wrote: Hot on the trails of 0.6.0 comes our latest, 0.6.1. This stable point release contains a number of important bugfixes[1] and is a painless upgrade from 0.6.0. Enjoy! [1]: http://bit.ly/9NqwAb (changelog) -- Eric Evans eev...@rackspace.commailto:eev...@rackspace.com
Re: 0.6 insert performance .... Re: [RELEASE] 0.6.1
It's hard to tell from those slides, but it looks like the slowdown doesn't hit until after several GCs. Perhaps this is compaction kicking in, not GCs? Definitely the extra I/O + CPU load from compaction will cause a drop in throughput. On Mon, Apr 19, 2010 at 6:14 AM, Masood Mortazavi masoodmortaz...@gmail.com wrote: I wonder if anyone can use: * Add logging of GC activity (CASSANDRA-813) to confirm this: http://www.slideshare.net/schubertzhang/cassandra-060-insert-throughput - m. On Sun, Apr 18, 2010 at 6:58 PM, Eric Evans eev...@rackspace.com wrote: Hot on the trails of 0.6.0 comes our latest, 0.6.1. This stable point release contains a number of important bugfixes[1] and is a painless upgrade from 0.6.0. Enjoy! [1]: http://bit.ly/9NqwAb (changelog) -- Eric Evans eev...@rackspace.com
Re: 0.6 insert performance .... Re: [RELEASE] 0.6.1
Since the scale of GC graph in the slides is different from the throughput ones. I will do another test for this issue. Thanks for your advices, Masood and Jonathan. --- Here, i just post my cossandra.in.sh. JVM_OPTS= \ -ea \ -Xms128M \ -Xmx6G \ -XX:TargetSurvivorRatio=90 \ -XX:+AggressiveOpts \ -XX:+UseParNewGC \ -XX:+UseConcMarkSweepGC \ -XX:+CMSParallelRemarkEnabled \ -XX:SurvivorRatio=128 \ -XX:MaxTenuringThreshold=0 \ -Dcom.sun.management.jmxremote.port=8081 \ -Dcom.sun.management.jmxremote.ssl=false \ -Dcom.sun.management.jmxremote.authenticate=false On Tue, Apr 20, 2010 at 5:46 AM, Masood Mortazavi masoodmortaz...@gmail.com wrote: Minimizing GC pauses or minimizing time slots allocated to GC pauses -- either through configuration or re-implementations of garbage collection bottlenecks (i.e. object-generation bottlenecks) -- seem to be the immediate approach. (Other approaches appear to be more intrusive.) At code level, using the GC logs, one can investigate further. There may be places were some object recycling can make some larger difference. Trying this first will probably bear more immediate fruit. - m. On Mon, Apr 19, 2010 at 9:11 AM, Daniel Kluesing d...@bluekai.com wrote: We see this behavior as well with 0.6, heap usage graphs look almost identical. The GC is a noticeable bottleneck, we’ve tried jdku19 and jrockit vm’s. It basically kills any kind of soft real time behavior. *From:* Masood Mortazavi [mailto:masoodmortaz...@gmail.com] *Sent:* Monday, April 19, 2010 4:15 AM *To:* user@cassandra.apache.org; d...@cassandra.apache.org *Subject:* 0.6 insert performance Re: [RELEASE] 0.6.1 I wonder if anyone can use: * Add logging of GC activity (CASSANDRA-813) to confirm this: http://www.slideshare.net/schubertzhang/cassandra-060-insert-throughput - m. On Sun, Apr 18, 2010 at 6:58 PM, Eric Evans eev...@rackspace.com wrote: Hot on the trails of 0.6.0 comes our latest, 0.6.1. This stable point release contains a number of important bugfixes[1] and is a painless upgrade from 0.6.0. Enjoy! [1]: http://bit.ly/9NqwAb (changelog) -- Eric Evans eev...@rackspace.com