Are you sure you’re blocked on internode and not commitlog? Batch is typically not what people expect (group commitlog in 4.0 is probably closer to what you think batch does).
-- Jeff Jirsa > On Nov 27, 2018, at 10:55 PM, Yuji Ito <y...@phact-columba.com> wrote: > > Hi, > > Thank you for the reply. > I've measured LWT throughput in 4.0. > > I used the cassandra-stress tool to insert rows with LWT for 3 minutes on > i3.xlarge and i3.4xlarge > For 3.11, I modified the tool to support LWT. > Before each measurement, I cleaned up all Cassandra data. > > The throughput in 4.0 is 5 % faster than 3.11. > The CPU load of i3.4xlarge (16 vCPUs) is only up to 75% in both versions. > And, the throughput was slower than 4 times that of i3.xlarge. > I think the throughput wasn't bounded by CPU also in 4.0. > > The CPU load of i3.4xlarge is up to 80 % with non-LWT write. > > I wonder what is the bottleneck for writes on a many-core machine if the > issue about messaging has been resolved in 4.0. > Can I use up CPU to insert rows by changing any parameter? > > # LWT insert > * Cassandra 3.11.3 > | instance type | # of threads | concurrent_writes | Throughput [op/s] | > | i3.xlarge | 64 | 32 | 2815 | > | i3.4xlarge | 256 | 128 | 9506 | > | i3.4xlarge | 512 | 256 | 10540 | > > * Cassandra 4.0 (trunk) > | instance type | # of threads | concurrent_writes | Throughput [op/s] | > | i3.xlarge | 64 | 32 | 2951 | > | i3.4xlarge | 256 | 128 | 9816 | > | i3.4xlarge | 512 | 256 | 11055 | > > * Environment > - 3 node cluster > - Replication factor: 3 > - Node instance: AWS EC2 i3.xlarge / i3.4xlarge > > * C* configuration > - Apache Cassandra 3.11.3 / 4.0 (trunk) > - commitlog_sync: batch > - concurrent_writes: 32, 256 > - native_transport_max_threads: 128(default), 256 (when concurrent_writes is > 256) > > Thanks, > Yuji > > > 2018年11月26日(月) 17:27 sankalp kohli <kohlisank...@gmail.com>: >> Inter-node messaging is rewritten using Netty in 4.0. It will be better to >> test it using that as potential changes will mostly land on top of that. >> >>> On Mon, Nov 26, 2018 at 7:39 AM Yuji Ito <y...@phact-columba.com> wrote: >>> Hi, >>> >>> I'm investigating LWT performance with C* 3.11.3. >>> It looks that the performance is bounded by messaging latency when many >>> requests are issued concurrently. >>> >>> According to the source code, the number of messaging threads per node is >>> only 1 thread for incoming and 1 thread for outbound "small" message to >>> another node. >>> >>> I guess these threads are frequently interrupted because many threads are >>> executed when many requests are issued. >>> Especially, I think it affects the LWT performance when many LWT requests >>> which need lots of inter-node messaging are issued. >>> >>> I measured that latency. It took 2.5 ms in average to enqueue a message at >>> a node and to receive the message at the **same** node with 96 concurrent >>> LWT writes. >>> Is it normal? I think it is too big latency, though a message was sent to >>> the same node. >>> >>> Decreasing numbers of other threads like `concurrent_counter_writes`, >>> `concurrent_materialized_view_writes` reduced a bit the latency. >>> Can I change any other parameter to reduce the latency? >>> I've tried using message coalescing, but they didn't reduce that. >>> >>> * Environment >>> - 3 node cluster >>> - Replication factor: 3 >>> - Node instance: AWS EC2 i3.xlarge >>> >>> * C* configuration >>> - Apache Cassandra 3.11.3 >>> - commitlog_sync: batch >>> - concurrent_reads: 32 (default) >>> - concurrent_writes: 32 (default) >>> >>> Thanks, >>> Yuji >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org >>> For additional commands, e-mail: dev-h...@cassandra.apache.org