Back to the futex()? :(

2016-02-06 Thread Will Hayworth
*tl;dr: other than CAS operations, what are the potential sources of lock
contention in C*?*

Hi all! :) I'm a novice Cassandra and Linux admin who's been preparing a
small cluster for production, and I've been seeing something weird. For
background: I'm running 3.2.1 on a cluster of 12 EC2 m4.2xlarges (32 GB
RAM, 8 HT cores) backed by 3.5 TB GP2 EBS volumes. Until late yesterday,
that was a cluster of 12 m4.xlarges with 3 TB volumes. I bumped it because
while backloading historical data I had been seeing awful throughput (20K
op/s at CL.ONE). I'd read through Al Tobey's *amazing* C* tuning guide
 once or
twice before but this time I was careful and fixed a bunch of defaults that
just weren't right, in cassandra.yaml/JVM options/block device parameters.
Folks on IRC were super helpful as always (hat tip to Jeff Jirsa in
particular) and pointed out, for example, that I shouldn't be using DTCS
for loading historical data--heh. After changing to LTCS, unbatching my
writes* and reserving a CPU core for interrupts and fixing the clocksource
to TSC, I finally hit 80K early this morning. Hooray! :)

Now, my question: I'm still seeing a *ton* of blocked processes in the
vmstats, anything from 2 to 9 per 10 second sample period--and this is
before EBS is even being hit! I've been trying in vain to figure out what
this could be--GC seems very quiet, after all. On Al's page's advice, I've
been running strace and, indeed, I've been seeing *tens of thousands of
futex() calls* in periods of 10 or 20 seconds. What eludes me is *where* this
lock contention is coming from. I'm not using LWTs or performing CAS
operations of which I'm aware. Assuming this isn't a red herring, what
gives?

Sorry for the essay--I just wanted to err on the side of more
context--and *thank
you* for any advice you'd like to offer,
Will

P.S. More background if you'd like--I'm running on Amazon Linux 2015.09,
using jemalloc 3.6, JDK 1.8.0_65-b17. Here  is
my cassandra.yaml and here  are my JVM args.
I realized I neglected to adjust memtable_flush_writers as I was writing
this--so I'll get on that. Aside from that, I'm not sure what to do.
(Thanks, again, for reading.)

* They were batched for consistency--I'm hoping to return to using them
when I'm back at normal load, which is tiny compared to backloading, but
the impact on performance was eye-opening.
___
Will Hayworth
Developer, Engagement Engine
Atlassian

My pronoun is "they". 


Re: Back to the futex()? :(

2016-02-06 Thread Will Hayworth
Additionally: this isn't the futex_wait bug (or at least it shouldn't
be?). Amazon
says  that was
fixed several kernel versions before mine, which
is 4.1.10-17.31.amzn1.x86_64. And the reason my heap is so large is
because, per CASSANDRA-9472, we can't use offheap until 3.4 is released.

Will

___
Will Hayworth
Developer, Engagement Engine
Atlassian

My pronoun is "they". 



On Sat, Feb 6, 2016 at 3:28 PM, Will Hayworth 
wrote:

> *tl;dr: other than CAS operations, what are the potential sources of lock
> contention in C*?*
>
> Hi all! :) I'm a novice Cassandra and Linux admin who's been preparing a
> small cluster for production, and I've been seeing something weird. For
> background: I'm running 3.2.1 on a cluster of 12 EC2 m4.2xlarges (32 GB
> RAM, 8 HT cores) backed by 3.5 TB GP2 EBS volumes. Until late yesterday,
> that was a cluster of 12 m4.xlarges with 3 TB volumes. I bumped it because
> while backloading historical data I had been seeing awful throughput (20K
> op/s at CL.ONE). I'd read through Al Tobey's *amazing* C* tuning guide
>  once
> or twice before but this time I was careful and fixed a bunch of defaults
> that just weren't right, in cassandra.yaml/JVM options/block device
> parameters. Folks on IRC were super helpful as always (hat tip to Jeff
> Jirsa in particular) and pointed out, for example, that I shouldn't be
> using DTCS for loading historical data--heh. After changing to LTCS,
> unbatching my writes* and reserving a CPU core for interrupts and fixing
> the clocksource to TSC, I finally hit 80K early this morning. Hooray! :)
>
> Now, my question: I'm still seeing a *ton* of blocked processes in the
> vmstats, anything from 2 to 9 per 10 second sample period--and this is
> before EBS is even being hit! I've been trying in vain to figure out what
> this could be--GC seems very quiet, after all. On Al's page's advice, I've
> been running strace and, indeed, I've been seeing *tens of thousands of
> futex() calls* in periods of 10 or 20 seconds. What eludes me is *where* this
> lock contention is coming from. I'm not using LWTs or performing CAS
> operations of which I'm aware. Assuming this isn't a red herring, what
> gives?
>
> Sorry for the essay--I just wanted to err on the side of more context--and 
> *thank
> you* for any advice you'd like to offer,
> Will
>
> P.S. More background if you'd like--I'm running on Amazon Linux 2015.09,
> using jemalloc 3.6, JDK 1.8.0_65-b17. Here  is
> my cassandra.yaml and here  are my JVM
> args. I realized I neglected to adjust memtable_flush_writers as I was
> writing this--so I'll get on that. Aside from that, I'm not sure what to
> do. (Thanks, again, for reading.)
>
> * They were batched for consistency--I'm hoping to return to using them
> when I'm back at normal load, which is tiny compared to backloading, but
> the impact on performance was eye-opening.
> ___
> Will Hayworth
> Developer, Engagement Engine
> Atlassian
>
> My pronoun is "they". 
>
>
>