[ 
https://issues.apache.org/jira/browse/CASSANDRA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16415846#comment-16415846
 ] 

Alexander Dejanovski commented on CASSANDRA-14318:
--------------------------------------------------

[~jjirsa]: apparently the ReadCallback class already logs at TRACE and not 
DEBUG on the latest 2.2.

I've created the fix that downgrades debug logging to trace logging in the 
query pager classes, and here are the results : 

debug on - no fix : 
{noformat}
Results:
op rate : 6681 [read_event_1:1109, read_event_2:1119, read_event_3:4452]
partition rate : 6681 [read_event_1:1109, read_event_2:1119, read_event_3:4452]
row rate : 6681 [read_event_1:1109, read_event_2:1119, read_event_3:4452]
latency mean : 19,1 [read_event_1:15,4, read_event_2:15,4, read_event_3:21,0]
latency median : 15,6 [read_event_1:14,2, read_event_2:14,0, read_event_3:16,3]
latency 95th percentile : 39,1 [read_event_1:28,4, read_event_2:28,6, 
read_event_3:44,2]
latency 99th percentile : 75,6 [read_event_1:52,9, read_event_2:53,6, 
read_event_3:87,7]
latency 99.9th percentile : 315,7 [read_event_1:101,0, read_event_2:110,1, 
read_event_3:361,1]
latency max : 609,1 [read_event_1:319,6, read_event_2:315,9, read_event_3:609,1]
Total partitions : 993050 [read_event_1:164882, read_event_2:166381, 
read_event_3:661787]
Total errors : 0 [read_event_1:0, read_event_2:0, read_event_3:0]
total gc count : 189
total gc mb : 56464
total gc time (s) : 7
avg gc time(ms) : 37
stdev gc time(ms) : 8
Total operation time : 00:02:28{noformat}
 

 

debug off - no fix : 
{noformat}
Results:
op rate : 12655 [read_event_1:2141, read_event_2:2093, read_event_3:8422]
partition rate : 12655 [read_event_1:2141, read_event_2:2093, read_event_3:8422]
row rate : 12655 [read_event_1:2141, read_event_2:2093, read_event_3:8422]
latency mean : 10,1 [read_event_1:10,1, read_event_2:10,1, read_event_3:10,1]
latency median : 9,2 [read_event_1:9,2, read_event_2:9,2, read_event_3:9,3]
latency 95th percentile : 15,2 [read_event_1:15,8, read_event_2:15,9, 
read_event_3:15,7]
latency 99th percentile : 29,3 [read_event_1:44,5, read_event_2:45,1, 
read_event_3:41,3]
latency 99.9th percentile : 52,7 [read_event_1:67,9, read_event_2:66,9, 
read_event_3:67,1]
latency max : 268,0 [read_event_1:257,1, read_event_2:263,3, read_event_3:268,0]
Total partitions : 983056 [read_event_1:166311, read_event_2:162570, 
read_event_3:654175]
Total errors : 0 [read_event_1:0, read_event_2:0, read_event_3:0]
total gc count : 100
total gc mb : 31529
total gc time (s) : 4
avg gc time(ms) : 37
stdev gc time(ms) : 5
Total operation time : 00:01:17{noformat}
 

 

debug on - with fix : 
{noformat}
Results:
op rate : 12289 [read_event_1:2058, read_event_2:2051, read_event_3:8181]
partition rate : 12289 [read_event_1:2058, read_event_2:2051, read_event_3:8181]
row rate : 12289 [read_event_1:2058, read_event_2:2051, read_event_3:8181]
latency mean : 10,4 [read_event_1:10,4, read_event_2:10,4, read_event_3:10,4]
latency median : 9,4 [read_event_1:9,4, read_event_2:9,4, read_event_3:9,4]
latency 95th percentile : 16,3 [read_event_1:16,8, read_event_2:17,3, 
read_event_3:16,2]
latency 99th percentile : 36,6 [read_event_1:44,3, read_event_2:46,6, 
read_event_3:37,2]
latency 99.9th percentile : 62,2 [read_event_1:78,0, read_event_2:77,1, 
read_event_3:80,8]
latency max : 251,2 [read_event_1:246,9, read_event_2:249,9, read_event_3:251,2]
Total partitions : 1000000 [read_event_1:167422, read_event_2:166861, 
read_event_3:665717]
Total errors : 0 [read_event_1:0, read_event_2:0, read_event_3:0]
total gc count : 102
total gc mb : 31843
total gc time (s) : 4
avg gc time(ms) : 38
stdev gc time(ms) : 6
Total operation time : 00:01:21{noformat}
 

 

So we have similar performance with debug logging off and with the fix and 
debug on.
The difference in throughput is pretty massive as we roughly get *twice the 
read throughput* with the fix.

Latencies without the fix and with the fix : 

p95 : 35ms -> 16ms
p99 : 75ms -> 36ms

I've ran all tests several times, alternating with and without the fix to make 
sure caches were not making a difference, and results were consistent with 
what's pasted above.
It's been running on a single node using an i3.xlarge instance for Cassandra 
and another i3.large for running cassandra-stress.

 

*One pretty interesting thing to note* is that when I tested with the 
predefined mode of cassandra-stress, no paging occurred and the performance 
difference was not noticeable. This is due to the fact that the predefined mode 
generates COMPACT STORAGE tables, which involve a different read path 
(apparently). I think anyone performing benchmarks for Cassandra changes should 
be aware that the predefined mode isn't relevant and that a user defined test 
should be used (maybe we should create one that would be used as standard 
benchmark). 
Here's the one I used : [^cassandra-2.2-debug.yaml]

With the following commands for writing : 

 
{noformat}
/usr/bin/cassandra-stress user profile=/home/ec2-user/cassandra-2.2-debug.yaml 
n=1000000 'ops(insert=1)' cl=LOCAL_ONE no-warmup -node 172.31.31.42 -mode 
native cql3 compression=lz4 -rate threads=128{noformat}
 

And for reading : 

 
{noformat}
/usr/bin/cassandra-stress user profile=/home/ec2-user/cassandra-2.2-debug.yaml 
n=1000000 'ops(read_event_1=1,read_event_2=1,read_event_3=4)' cl=LOCAL_ONE 
no-warmup -node 172.31.31.42 -mode native cql3 compression=lz4 -rate 
threads=128{noformat}
 

[~pauloricardomg] : here's the 
[patch|https://github.com/apache/cassandra/compare/trunk...thelastpickle:CASSANDRA-14318]
 if you're willing to review/commit it, and the unit test results in 
[CircleCI|https://circleci.com/gh/thelastpickle/cassandra/178].

> Debug logging can create massive performance issues
> ---------------------------------------------------
>
>                 Key: CASSANDRA-14318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14318
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Alexander Dejanovski
>            Assignee: Alexander Dejanovski
>            Priority: Major
>              Labels: lhf, performance
>             Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>         Attachments: cassandra-2.2-debug.yaml, debuglogging.png, flame22 
> nodebug sjk svg.png, flame22-nodebug-sjk.svg, flame22-sjk.svg, 
> flame_graph_snapshot.png
>
>
> Debug logging can involve in many cases (especially very low latency ones) a 
> very important overhead on the read path in 2.2 as we've seen when upgrading 
> clusters from 2.0 to 2.2.
> The performance impact was especially noticeable on the client side metrics, 
> where p99 could go up to 10 times higher, while ClientRequest metrics 
> recorded by Cassandra didn't show any overhead.
> Below shows latencies recorded on the client side with debug logging on 
> first, and then without it :
> !debuglogging.png!  
> We generated a flame graph before turning off debug logging that shows the 
> read call stack is dominated by debug logging : 
> !flame_graph_snapshot.png!
> I've attached the original flame graph for exploration.
> Once disabled, the new flame graph shows that the read call stack gets 
> extremely thin, which is further confirmed by client recorded metrics : 
> !flame22 nodebug sjk svg.png!
> The query pager code has been reworked since 3.0 and it looks like 
> log.debug() calls are gone there, but for 2.2 users and to prevent such 
> issues to appear with default settings, I really think debug logging should 
> be disabled by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to