Re: To batch or not to batch: A question for fast inserts

2015-09-27 Thread Gerard Maas
Hi Eric, Ryan, Thanks a lot for your insights. I got more than I hoped for in this discussion. I'll further improve our code to include the replica-awareness and will compare that to the previous tests. That snipped of code is really helpful. Thanks. I have not been in the list long enough to

Re: To batch or not to batch: A question for fast inserts

2015-09-27 Thread Graham Sanderson
We are about to prototype upgrading our batch inserts, so I’m really glad about this thread… we are able to saturate our dedicated network links from hadoop when inserting via thrift API (Astyanax) - at the time we wrote that code CQL wasn’t there. Reasons to replace our current solution: 1)

Re: To batch or not to batch: A question for fast inserts

2015-09-25 Thread Ryan Svihla
I think my main point is still, unlogged token aware batches are great, but if you’re writes are large enough, they may actually hurt rather than help, and likewise if your writes are too small, async only is likely only going to hurt. I’d say the average user I’ve had to help (with my

Re: To batch or not to batch: A question for fast inserts

2015-09-25 Thread Eric Stevens
Yep, my approach is definitely naive to hotspotting. If someone had that trouble, they could exhaust the iterator out of getReplicas() and distribute their writes more evenly (which might result in better statement distribution, but wouldn't change the workload on the cluster). In the end

Re: To batch or not to batch: A question for fast inserts

2015-09-25 Thread Eric Stevens
> compaction usually is the limiter for most clusters, so the difference between async versus unlogged batch ends up being minor or worse..non existent cause the hardware and data model combination result in compaction being the main throttle. If your number of records to load per second is

Re: To batch or not to batch: A question for fast inserts

2015-09-25 Thread Ryan Svihla
Generally this is all correct but I cannot emphasize enough how much this “just depends” and today I generally move people to async inserts first before trying to micro-optimize some things to keep in mind. compaction usually is the limiter for most clusters, so the difference between async

Re: To batch or not to batch: A question for fast inserts

2015-09-24 Thread Eric Stevens
> I side-tracked some punctual benchmarks and stumbled on the observations of unlogged inserts being *A LOT* faster than the async counterparts. My own testing agrees very strongly with this. When this topic came up on this list before, there was a concern that batch coordination produces GC

To batch or not to batch: A question for fast inserts

2015-09-22 Thread Gerard Maas
General advice advocates for individual async inserts as the fastest way to insert data into Cassandra. Our insertion mechanism is based on that model and recently we have been evaluating performance, looking to measure and optimize our ingestion rate. I side-tracked some punctual benchmarks and