[Cassandra 2.1.5] I'm trying to explore my options for increasing read throughput with token scans (SELECT * FROM x WHERE token(y) > L AND token(y) < L). So far I've started by reading an entire virtual token range from a single node.
Currently on a single query I can read about 57,286.03 rows/s which translates to 5.5 MiB/s. However under load (even under heavy load) my disk utilization never gets that high (SSDs, less than 10%) - nor does my network utilization (1gbit). So far I've tried - - Moving to the G1 collector (started with the cassandra-env that is was linked from CASSANDRA-7486) - which reduced timeouts which I think were caused longish pauses - Enabling TIMEHORIZON message coalescing I'm still very new to JVM tuning but I used jstack to inspect what was going on in threads with high cpu usage. Its almost always either OutBoundTcpConnection stack/thread or SEPWorker stack/thread - and judging by what the SEPWorker does (I mostly see compares like https://github.com/apache/cassandra/blob/cassandra-2.1.5/src/java/org/apache/cassandra/db/composites/AbstractCType.java#L185), I think I might be CPU bound? (I'm still new to the actual Cassandra source code, so apologies if that doesn't make sense either). Given this information, does anyone have any pointers on what levers I could pull next or other things I can look to measure? Thanks for any help, Nimi