Re: Trouble with using group commitlog_sync

2024-04-24 Thread Bowen Song via user
Okay, that proves I was wrong on the client side bottleneck. On 24/04/2024 17:55, Nathan Marz wrote: I tried running two client processes in parallel and the numbers were unchanged. The max throughput is still a single client doing 10 in-flight BatchStatement containing 100 inserts. On Tue,

Re: Trouble with using group commitlog_sync

2024-04-24 Thread Nathan Marz
I tried running two client processes in parallel and the numbers were unchanged. The max throughput is still a single client doing 10 in-flight BatchStatement containing 100 inserts. On Tue, Apr 23, 2024 at 10:24 PM Bowen Song via user < user@cassandra.apache.org> wrote: > You might have run

Re: Trouble with using group commitlog_sync

2024-04-24 Thread Bowen Song via user
You might have run into the bottleneck of the driver's IO thread. Try increase the driver's connections-per-server limit to 2 or 3 if you've only got 1 server in the cluster. Or alternatively, run two client processes in parallel. On 24/04/2024 07:19, Nathan Marz wrote: Tried it again with

Re: Trouble with using group commitlog_sync

2024-04-24 Thread Nathan Marz
Tried it again with one more client thread, and that had no effect on performance. This is unsurprising as there's only 2 CPU on this node and they were already at 100%. These were good ideas, but I'm still unable to even match the performance of batch commit mode with group commit mode. On Tue,

Re: Trouble with using group commitlog_sync

2024-04-23 Thread Bowen Song via user
To achieve 10k loop iterations per second, each iteration must take 0.1 milliseconds or less. Considering that each iteration needs to lock and unlock the semaphore (two syscalls) and make network requests (more syscalls), that's a lots of context switches. It may a bit too much to ask for a

Re: Trouble with using group commitlog_sync

2024-04-23 Thread Nathan Marz
It's using the async API, so why would it need multiple threads? Using the exact same approach I'm able to get 38k / second with periodic commitlog_sync. For what it's worth, I do see 100% CPU utilization in every single one of these tests. On Tue, Apr 23, 2024 at 11:01 AM Bowen Song via user <

Re: Trouble with using group commitlog_sync

2024-04-23 Thread Bowen Song via user
Have you checked the thread CPU utilisation of the client side? You likely will need more than one thread to do insertion in a loop to achieve tens of thousands of inserts per second. On 23/04/2024 21:55, Nathan Marz wrote: Thanks for the explanation. I tried again with 

Re: Trouble with using group commitlog_sync

2024-04-23 Thread Nathan Marz
Thanks for the explanation. I tried again with commitlog_sync_group_window at 2ms, concurrent_writes at 512, and doing 1000 individual inserts at a time with the same loop + semaphore approach. This only nets 9k / second. I got much higher throughput for the other modes with BatchStatement of

Re: Trouble with using group commitlog_sync

2024-04-23 Thread Bowen Song via user
I suspect you are abusing batch statements. Batch statements should only be used where atomicity or isolation is needed. Using batch statements won't make inserting multiple partitions faster. In fact, it often will make that slower. Also, the liner relationship between

Re: Trouble with using group commitlog_sync

2024-04-23 Thread Nathan Marz
Thanks. I raised concurrent_writes to 128 and set commitlog_sync_group_window to 20ms. This causes a single execute of a BatchStatement containing 100 inserts to succeed. However, the throughput I'm seeing is atrocious. With these settings, I'm executing 10 BatchStatement concurrently at a time

Re: Trouble with using group commitlog_sync

2024-04-23 Thread Bowen Song via user
The default commitlog_sync_group_window is very long for SSDs. Try reduce it if you are using SSD-backed storage for the commit log. 10-15 ms is a good starting point. You may also want to increase the value of concurrent_writes, consider at least double or quadruple it from the default.

Re: Trouble with using group commitlog_sync

2024-04-23 Thread Nathan Marz
"batch" mode works fine. I'm having trouble with "group" mode. The only config for that is "commitlog_sync_group_window", and I have that set to the default 1000ms. On Tue, Apr 23, 2024 at 8:15 AM Bowen Song via user < user@cassandra.apache.org> wrote: > Why would you want to set

Re: Trouble with using group commitlog_sync

2024-04-23 Thread Bowen Song via user
Why would you want to set commitlog_sync_batch_window to 1 second long when commitlog_sync is set to batch mode? The documentation on this says: /This window should be kept short because the writer threads

Trouble with using group commitlog_sync

2024-04-23 Thread Nathan Marz
I'm doing some benchmarking of Cassandra on a single m6gd.large instance. It works fine with periodic or batch commitlog_sync options, but I'm having tons of issues when I change it to "group". I have "commitlog_sync_group_window" set to 1000ms. My client is doing writes like this (pseudocode):