You shouldn't need a kernel recompile.  Check out the section "Simple
solution for the problem" in
http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux.
You can balance your requests across up to 8 CPUs.

I'll check out the flame graphs in a little bit - in the middle of
something and my brain doesn't multitask well :)

On Thu, May 25, 2017 at 1:06 PM Eric Pederson <eric...@gmail.com> wrote:

> Hi Jonathan -
>
> It looks like these machines are configured to use CPU 0 for all I/O
> interrupts.  I don't think I'm going to get the OK to compile a new kernel
> for them to balance the interrupts across CPUs, but to mitigate the problem
> I taskset the Cassandra process to run on all CPU except 0.  It didn't
> change the performance though.  Let me know if you think it's crucial that
> we balance the interrupts across CPUs and I can try to lobby for a new
> kernel.
>
> Here are flamegraphs from each node from a cassandra-stress ingest into a
> table representative of the what we are going to be using.   This table is
> also roughly 200 bytes, with 64 columns and the primary key (date,
> sequence_number).  Cassandra-stress was run on 3 separate client
> machines.  Using cassandra-stress to write to this table I see the same
> thing: neither disk, CPU or network is fully utilized.
>
>    -
>    
> http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva01_sars.svg
>    -
>    
> http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva02_sars.svg
>    -
>    
> http://sourcedelica.com/wordpress/wp-content/uploads/2017/05/flamegraph_ultva03_sars.svg
>
> Re: GC: In the stress run with the parameters above, two of the three
> nodes log zero or one GCInspectors.  On the other hand, the 3rd machine
> logs a GCInspector every 5 seconds or so, 300-500ms each time.  I found
> out that the 3rd machine actually has different specs as the other two.
> It's an older box with the same RAM but less CPUs (32 instead of 48), a
> slower SSD and slower memory.   The Cassandra configuration is exactly the
> same.   I tried running Cassandra with only 32 CPUs on the newer boxes to
> see if that would cause them to GC pause more, but it didn't.
>
> On a separate topic - for this cassandra-stress run I reduced the batch
> size to 2 in order to keep the logs clean.  That also reduced the
> throughput from around 100k rows/second to 32k rows/sec.  I've been doing
> ingestion tests using cassandra-stress, cqlsh COPY FROM and a custom C++
> application.  In most of the tests that I've been doing I've been using a
> batch size of around 20 (unlogged, all batch rows have the same partition
> key).  However, it fills the logs with batch size warnings.  I was going to
> raise the batch warning size but the docs scared me away from doing that.
> Given that we're using unlogged/same partition batches is it safe to raise
> the batch size warning limit?   Actually cqlsh COPY FROM has very good
> throughput using a small batch size, but I can't get that same throughput
> in cassandra-stress or my C++ app with a batch size of 2.
>
> Thanks!
>
>
>
> -- Eric
>
> On Mon, May 22, 2017 at 5:08 PM, Jonathan Haddad <j...@jonhaddad.com>
> wrote:
>
>> How many CPUs are you using for interrupts?
>> http://www.alexonlinux.com/smp-affinity-and-proper-interrupt-handling-in-linux
>>
>> Have you tried making a flame graph to see where Cassandra is spending
>> its time?
>> http://www.brendangregg.com/blog/2014-06-12/java-flame-graphs.html
>>
>> Are you tracking GC pauses?
>>
>> Jon
>>
>> On Mon, May 22, 2017 at 2:03 PM Eric Pederson <eric...@gmail.com> wrote:
>>
>>> Hi all:
>>>
>>> I'm new to Cassandra and I'm doing some performance testing.  One of
>>> things that I'm testing is ingestion throughput.   My server setup is:
>>>
>>>    - 3 node cluster
>>>    - SSD data (both commit log and sstables are on the same disk)
>>>    - 64 GB RAM per server
>>>    - 48 cores per server
>>>    - Cassandra 3.0.11
>>>    - 48 Gb heap using G1GC
>>>    - 1 Gbps NICs
>>>
>>> Since I'm using SSD I've tried tuning the following (one at a time) but
>>> none seemed to make a lot of difference:
>>>
>>>    - concurrent_writes=384
>>>    - memtable_flush_writers=8
>>>    - concurrent_compactors=8
>>>
>>> I am currently doing ingestion tests sending data from 3 clients on the
>>> same subnet.  I am using cassandra-stress to do some ingestion testing.
>>> The tests are using CL=ONE and RF=2.
>>>
>>> Using cassandra-stress (3.10) I am able to saturate the disk using a
>>> large enough column size and the standard five column cassandra-stress
>>> schema.  For example, -col size=fixed(400) will saturate the disk and
>>> compactions will start falling behind.
>>>
>>> One of our main tables has a row size that approximately 200 bytes,
>>> across 64 columns.  When ingesting this table I don't see any resource
>>> saturation.  Disk utilization is around 10-15% per iostat.  Incoming
>>> network traffic on the servers is around 100-300 Mbps.  CPU utilization is
>>> around 20-70%.  nodetool tpstats shows mostly zeros with occasional
>>> spikes around 500 in MutationStage.
>>>
>>> The stress run does 10,000,000 inserts per client, each with a separate
>>> range of partition IDs.  The run with 200 byte rows takes about 4 minutes,
>>> with mean Latency 4.5ms, Total GC time of 21 secs, Avg GC time 173 ms.
>>>
>>> The overall performance is good - around 120k rows/sec ingested.  But
>>> I'm curious to know where the bottleneck is.  There's no resource
>>> saturation and nodetool tpstats shows only occasional brief queueing.
>>> Is the rest just expected latency inside of Cassandra?
>>>
>>> Thanks,
>>>
>>> -- Eric
>>>
>>
>

Reply via email to