Hi Garo,

Did you put the commit log on its own drive? Spiking CPU during stalls
is a symptom of not doing that. The commitlog is very latency
sensitive, even under low load. Do be sure you're using the deadline
or noop scheduler for that reason, too.

-Mark

On Fri, Jul 22, 2016 at 4:44 PM, Juho Mäkinen <juho.maki...@gmail.com> wrote:
>> Are you using XFS or Ext4 for data?
>
>
> We are using XFS. Many nodes have a couple large SSTables (in order of 20-50
> GiB), but I havent cross checked if the load spikes happen only on machines
> which have these tables.
>
>>
>> As an aside, for the amount of reads/writes you're doing, I've found
>> using c3/m3 instances with the commit log on the ephemeral storage and
>> data on st1 EBS volumes to be much more cost effective. It's something
>> to look into if you haven't already.
>
>
> Thanks for the idea! I previously used c4.4xlarge instances with two 1500 GB
> GP2 volumes, but I found out that we maxed out their bandwidth too easily,
> so that's why my newest cluster is based on i2.4xlarge instances.
>
> And to answer Ryan: No, we are not using counters.
>
> I was thinking that could the big amount (100+ GiB) of mmap'ed files somehow
> cause some inefficiencies on the kernel side. That's why I started to learn
> on kernel huge pages and came up with the idea of disabling the huge page
> defrag, but nothing what I've found indicates that this can be a real
> problem. After all, Linux fs cache is a really old feature, so I expect it
> to be pretty bug free.
>
> I guess that I have to next learn how the load value itself is calculated. I
> know about the basic idea that when load is below the number of CPUs then
> the system should still be fine, but there's at least the iowait which is
> also used to calculate the load. So because I am not seeing any extensive
> iowait, and my userland CPU usage is well below what my 16 cores should
> handle, then what else contributes to the system load? Can I somehow make
> any educated guess what the high load might tell me if it's not iowait and
> it's not purely userland process CPU usage? This is starting to get really
> deep really fast :/
>
>  - Garo
>
>
>>
>>
>> -Mark
>>
>> On Fri, Jul 22, 2016 at 8:10 AM, Juho Mäkinen <juho.maki...@gmail.com>
>> wrote:
>> > After a few days I've also tried disabling Linux kernel huge pages
>> > defragement (echo never > /sys/kernel/mm/transparent_hugepage/defrag)
>> > and
>> > turning coalescing off (otc_coalescing_strategy: DISABLED), but either
>> > did
>> > do any good. I'm using LCS, there are no big GC pauses, and I have set
>> > "concurrent_compactors: 5" (machines have 16 CPUs), but there are
>> > usually
>> > not any compactions running when the load spike comes. "nodetool
>> > tpstats"
>> > shows no running thread pools except on the Native-Transport-Requests
>> > (usually 0-4) and perhaps ReadStage (usually 0-1).
>> >
>> > The symptoms are the same: after about 12-24 hours increasingly number
>> > of
>> > nodes start to show short CPU load spikes and this affects the median
>> > read
>> > latencies. I ran a dstat when a load spike was already under way (see
>> > screenshot http://i.imgur.com/B0S5Zki.png), but any other column than
>> > the
>> > load itself doesn't show any major change except the system/kernel CPU
>> > usage.
>> >
>> > All further ideas how to debug this are greatly appreciated.
>> >
>> >
>> > On Wed, Jul 20, 2016 at 7:13 PM, Juho Mäkinen <juho.maki...@gmail.com>
>> > wrote:
>> >>
>> >> I just recently upgraded our cluster to 2.2.7 and after turning the
>> >> cluster under production load the instances started to show high load
>> >> (as
>> >> shown by uptime) without any apparent reason and I'm not quite sure
>> >> what
>> >> could be causing it.
>> >>
>> >> We are running on i2.4xlarge, so we have 16 cores, 120GB of ram, four
>> >> 800GB SSDs (set as lvm stripe into one big lvol). Running
>> >> 3.13.0-87-generic
>> >> on HVM virtualisation. Cluster has 26 TiB of data stored in two tables.
>> >>
>> >> Symptoms:
>> >>  - High load, sometimes up to 30 for a short duration of few minutes,
>> >> then
>> >> the load drops back to the cluster average: 3-4
>> >>  - Instances might have one compaction running, but might not have any
>> >> compactions.
>> >>  - Each node is serving around 250-300 reads per second and around 200
>> >> writes per second.
>> >>  - Restarting node fixes the problem for around 18-24 hours.
>> >>  - No or very little IO-wait.
>> >>  - top shows that around 3-10 threads are running on high cpu, but that
>> >> alone should not cause a load of 20-30.
>> >>  - Doesn't seem to be GC load: A system starts to show symptoms so that
>> >> it
>> >> has ran only one CMS sweep. Not like it would do constant
>> >> stop-the-world
>> >> gc's.
>> >>  - top shows that the C* processes use 100G of RSS memory. I assume
>> >> that
>> >> this is because cassandra opens all SSTables with mmap() so that they
>> >> will
>> >> pop up in the RSS count because of this.
>> >>
>> >> What I've done so far:
>> >>  - Rolling restart. Helped for about one day.
>> >>  - Tried doing manual GC to the cluster.
>> >>  - Increased heap from 8 GiB with CMS to 16 GiB with G1GC.
>> >>  - sjk-plus shows bunch of SharedPool workers. Not sure what to make of
>> >> this.
>> >>  - Browsed over
>> >> https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html but
>> >> didn't
>> >> find any apparent
>> >>
>> >> I know that the general symptom of "system shows high load" is not very
>> >> good and informative, but I don't know how to better describe what's
>> >> going
>> >> on. I appreciate all ideas what to try and how to debug this further.
>> >>
>> >>  - Garo
>> >>
>> >
>
>

Reply via email to