Hi Jonathan -

It looks like these machines are configured to use CPU 0 for all I/O
interrupts.  I don't think I'm going to get the OK to compile a new kernel
for them to balance the interrupts across CPUs, but to mitigate the problem
I taskset the Cassandra process to run on all CPU except 0.  It didn't
change the performance though.  Let me know if you think it's crucial that
we balance the interrupts across CPUs and I can try to lobby for a new
kernel.

Here are flamegraphs from each node from a cassandra-stress ingest into a
table representative of the what we are going to be using.   This table is
also roughly 200 bytes, with 64 columns and the primary key (date,
sequence_number).  Cassandra-stress was run on 3 separate client machines.
Using cassandra-stress to write to this table I see the same thing: neither
disk, CPU or network is fully utilized.

   -
   
http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva01_sars.svg
   -
   
http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva02_sars.svg
   -
   
http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva03_sars.svg

Re: GC: In the stress run with the parameters above, two of the three nodes
log zero or one GCInspectors.  On the other hand, the 3rd machine logs a
GCInspector every 5 seconds or so, 300-500ms each time.  I found out that
the 3rd machine actually has different specs as the other two.  It's an
older box with the same RAM but less CPUs (32 instead of 48), a slower SSD
and slower memory.   The Cassandra configuration is exactly the same.   I
tried running Cassandra with only 32 CPUs on the newer boxes to see if that
would cause them to GC pause more, but it didn't.

On a separate topic - for this cassandra-stress run I reduced the batch
size to 2 in order to keep the logs clean.  That also reduced the
throughput from around 100k rows/second to 32k rows/sec.  I've been doing
ingestion tests using cassandra-stress, cqlsh COPY FROM and a custom C++
application.  In most of the tests that I've been doing I've been using a
batch size of around 20 (unlogged, all batch rows have the same partition
key).  However, it fills the logs with batch size warnings.  I was going to
raise the batch warning size but the docs scared me away from doing that.
Given that we're using unlogged/same partition batches is it safe to raise
the batch size warning limit?   Actually cqlsh COPY FROM has very good
throughput using a small batch size, but I can't get that same throughput
in cassandra-stress or my C++ app with a batch size of 2.

Thanks!



-- Eric

On Mon, May 22, 2017 at 5:08 PM, Jonathan Haddad <j...@jonhaddad.com> wrote:

> How many CPUs are you using for interrupts?  http://www.alexonlinux.com/
> smp-affinity-and-proper-interrupt-handling-in-linux
>
> Have you tried making a flame graph to see where Cassandra is spending its
> time? http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html
>
> Are you tracking GC pauses?
>
> Jon
>
> On Mon, May 22, 2017 at 2:03 PM Eric Pederson <eric...@gmail.com> wrote:
>
>> Hi all:
>>
>> I'm new to Cassandra and I'm doing some performance testing.  One of
>> things that I'm testing is ingestion throughput.   My server setup is:
>>
>>    - 3 node cluster
>>    - SSD data (both commit log and sstables are on the same disk)
>>    - 64 GB RAM per server
>>    - 48 cores per server
>>    - Cassandra 3.0.11
>>    - 48 Gb heap using G1GC
>>    - 1 Gbps NICs
>>
>> Since I'm using SSD I've tried tuning the following (one at a time) but
>> none seemed to make a lot of difference:
>>
>>    - concurrent_writes=384
>>    - memtable_flush_writers=8
>>    - concurrent_compactors=8
>>
>> I am currently doing ingestion tests sending data from 3 clients on the
>> same subnet.  I am using cassandra-stress to do some ingestion testing.
>> The tests are using CL=ONE and RF=2.
>>
>> Using cassandra-stress (3.10) I am able to saturate the disk using a
>> large enough column size and the standard five column cassandra-stress
>> schema.  For example, -col size=fixed(400) will saturate the disk and
>> compactions will start falling behind.
>>
>> One of our main tables has a row size that approximately 200 bytes,
>> across 64 columns.  When ingesting this table I don't see any resource
>> saturation.  Disk utilization is around 10-15% per iostat.  Incoming
>> network traffic on the servers is around 100-300 Mbps.  CPU utilization is
>> around 20-70%.  nodetool tpstats shows mostly zeros with occasional
>> spikes around 500 in MutationStage.
>>
>> The stress run does 10,000,000 inserts per client, each with a separate
>> range of partition IDs.  The run with 200 byte rows takes about 4 minutes,
>> with mean Latency 4.5ms, Total GC time of 21 secs, Avg GC time 173 ms.
>>
>> The overall performance is good - around 120k rows/sec ingested.  But I'm
>> curious to know where the bottleneck is.  There's no resource saturation and
>> nodetool tpstats shows only occasional brief queueing.  Is the rest just
>> expected latency inside of Cassandra?
>>
>> Thanks,
>>
>> -- Eric
>>
>

Reply via email to