Hi Garo, Did you put the commit log on its own drive? Spiking CPU during stalls is a symptom of not doing that. The commitlog is very latency sensitive, even under low load. Do be sure you're using the deadline or noop scheduler for that reason, too.
-Mark On Fri, Jul 22, 2016 at 4:44 PM, Juho Mäkinen <juho.maki...@gmail.com> wrote: >> Are you using XFS or Ext4 for data? > > > We are using XFS. Many nodes have a couple large SSTables (in order of 20-50 > GiB), but I havent cross checked if the load spikes happen only on machines > which have these tables. > >> >> As an aside, for the amount of reads/writes you're doing, I've found >> using c3/m3 instances with the commit log on the ephemeral storage and >> data on st1 EBS volumes to be much more cost effective. It's something >> to look into if you haven't already. > > > Thanks for the idea! I previously used c4.4xlarge instances with two 1500 GB > GP2 volumes, but I found out that we maxed out their bandwidth too easily, > so that's why my newest cluster is based on i2.4xlarge instances. > > And to answer Ryan: No, we are not using counters. > > I was thinking that could the big amount (100+ GiB) of mmap'ed files somehow > cause some inefficiencies on the kernel side. That's why I started to learn > on kernel huge pages and came up with the idea of disabling the huge page > defrag, but nothing what I've found indicates that this can be a real > problem. After all, Linux fs cache is a really old feature, so I expect it > to be pretty bug free. > > I guess that I have to next learn how the load value itself is calculated. I > know about the basic idea that when load is below the number of CPUs then > the system should still be fine, but there's at least the iowait which is > also used to calculate the load. So because I am not seeing any extensive > iowait, and my userland CPU usage is well below what my 16 cores should > handle, then what else contributes to the system load? Can I somehow make > any educated guess what the high load might tell me if it's not iowait and > it's not purely userland process CPU usage? This is starting to get really > deep really fast :/ > > - Garo > > >> >> >> -Mark >> >> On Fri, Jul 22, 2016 at 8:10 AM, Juho Mäkinen <juho.maki...@gmail.com> >> wrote: >> > After a few days I've also tried disabling Linux kernel huge pages >> > defragement (echo never > /sys/kernel/mm/transparent_hugepage/defrag) >> > and >> > turning coalescing off (otc_coalescing_strategy: DISABLED), but either >> > did >> > do any good. I'm using LCS, there are no big GC pauses, and I have set >> > "concurrent_compactors: 5" (machines have 16 CPUs), but there are >> > usually >> > not any compactions running when the load spike comes. "nodetool >> > tpstats" >> > shows no running thread pools except on the Native-Transport-Requests >> > (usually 0-4) and perhaps ReadStage (usually 0-1). >> > >> > The symptoms are the same: after about 12-24 hours increasingly number >> > of >> > nodes start to show short CPU load spikes and this affects the median >> > read >> > latencies. I ran a dstat when a load spike was already under way (see >> > screenshot http://i.imgur.com/B0S5Zki.png), but any other column than >> > the >> > load itself doesn't show any major change except the system/kernel CPU >> > usage. >> > >> > All further ideas how to debug this are greatly appreciated. >> > >> > >> > On Wed, Jul 20, 2016 at 7:13 PM, Juho Mäkinen <juho.maki...@gmail.com> >> > wrote: >> >> >> >> I just recently upgraded our cluster to 2.2.7 and after turning the >> >> cluster under production load the instances started to show high load >> >> (as >> >> shown by uptime) without any apparent reason and I'm not quite sure >> >> what >> >> could be causing it. >> >> >> >> We are running on i2.4xlarge, so we have 16 cores, 120GB of ram, four >> >> 800GB SSDs (set as lvm stripe into one big lvol). Running >> >> 3.13.0-87-generic >> >> on HVM virtualisation. Cluster has 26 TiB of data stored in two tables. >> >> >> >> Symptoms: >> >> - High load, sometimes up to 30 for a short duration of few minutes, >> >> then >> >> the load drops back to the cluster average: 3-4 >> >> - Instances might have one compaction running, but might not have any >> >> compactions. >> >> - Each node is serving around 250-300 reads per second and around 200 >> >> writes per second. >> >> - Restarting node fixes the problem for around 18-24 hours. >> >> - No or very little IO-wait. >> >> - top shows that around 3-10 threads are running on high cpu, but that >> >> alone should not cause a load of 20-30. >> >> - Doesn't seem to be GC load: A system starts to show symptoms so that >> >> it >> >> has ran only one CMS sweep. Not like it would do constant >> >> stop-the-world >> >> gc's. >> >> - top shows that the C* processes use 100G of RSS memory. I assume >> >> that >> >> this is because cassandra opens all SSTables with mmap() so that they >> >> will >> >> pop up in the RSS count because of this. >> >> >> >> What I've done so far: >> >> - Rolling restart. Helped for about one day. >> >> - Tried doing manual GC to the cluster. >> >> - Increased heap from 8 GiB with CMS to 16 GiB with G1GC. >> >> - sjk-plus shows bunch of SharedPool workers. Not sure what to make of >> >> this. >> >> - Browsed over >> >> https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html but >> >> didn't >> >> find any apparent >> >> >> >> I know that the general symptom of "system shows high load" is not very >> >> good and informative, but I don't know how to better describe what's >> >> going >> >> on. I appreciate all ideas what to try and how to debug this further. >> >> >> >> - Garo >> >> >> > > >