[jira] [Commented] (CASSANDRA-9619) Read performance regression in tables with many columns on trunk and 2.2 vs. 2.1

Jim Witschey (JIRA) Thu, 25 Jun 2015 20:55:42 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14602365#comment-14602365
 ]


Jim Witschey commented on CASSANDRA-9619:
-----------------------------------------

Ryan's made some improvements to the {{cstar_perf}} backend that make it 
possible to run write workloads without deleting data afterwards, then run read 
workloads over those datasets. In that environment, I've got my initial writes 
down and I've got the first read step going. I'll review the results in the 
morning.

> Read performance regression in tables with many columns on trunk and 2.2 vs. 
> 2.1
> --------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9619
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9619
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jim Witschey
>            Assignee: T Jake Luciani
>              Labels: perfomance
>             Fix For: 2.2.0 rc2
>
>
> There seems to be a regression in read in 2.2 and trunk, as compared to 2.1 
> and 2.0. I found it running cstar_perf jobs with 50-column tables. 2.2 may be 
> worse than trunk, though my results on that aren't consistent. The relevant 
> cstar_perf jobs are here:
> http://cstar.datastax.com/tests/id/273e2ea8-0fc8-11e5-816c-42010af0688f
> http://cstar.datastax.com/tests/id/3a8002d6-1480-11e5-97ff-42010af0688f
> http://cstar.datastax.com/tests/id/40ff2766-1248-11e5-bac8-42010af0688f
> The sequence of commands for these jobs is
> {code}
> stress write n=65000000 -rate threads=300 -col n=FIXED\(50\)
> stress read n=65000000 -rate threads=300
> stress read n=65000000 -rate threads=300
> {code}
> Have a look at the operations per second going from [the first read 
> operation|http://cstar.datastax.com/graph?stats=273e2ea8-0fc8-11e5-816c-42010af0688f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=729.08&ymin=0&ymax=174379.7]
>  to [the second read 
> operation|http://cstar.datastax.com/graph?stats=273e2ea8-0fc8-11e5-816c-42010af0688f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=729.08&ymin=0&ymax=174379.7].
>  They've fallen from ~135K to ~100K comparing trunk to 2.1 and 2.0. It's 
> slightly worse for 2.2, and 2.2 operations per second fall continuously from 
> the first to the second read operation.
> There's a corresponding increase in read latency -- it's noticable on trunk 
> and pretty bad on 2.2. Again, the latency gets higher and higher on 2.2 as 
> the read operations progress (see the graphs 
> [here|http://cstar.datastax.com/graph?stats=273e2ea8-0fc8-11e5-816c-42010af0688f&metric=95th_latency&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=729.08&ymin=0&ymax=17.27]
>  and 
> [here|http://cstar.datastax.com/graph?stats=273e2ea8-0fc8-11e5-816c-42010af0688f&metric=95th_latency&operation=3_read&smoothing=1&show_aggregates=true&xmin=0&xmax=928.62&ymin=0&ymax=14.52]).
> I see a similar regression in a [more recent 
> test|http://cstar.datastax.com/graph?stats=40ff2766-1248-11e5-bac8-42010af0688f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=752.62&ymin=0&ymax=171799.1],
>  though in this one trunk performed worse than 2.2. This run also didn't 
> display the increasing latency in 2.2.
> This regression may show for smaller numbers of columns, but not as 
> prominently, as shown [in the results to this test with the stress default of 
> 5 
> columns|http://cstar.datastax.com/graph?stats=227cb89e-0fc8-11e5-9f14-42010af0688f&metric=99.9th_latency&operation=3_read&smoothing=1&show_aggregates=true&xmin=0&xmax=498.19&ymin=0&ymax=334.29].
>  There's an increase in latency variability on trunk and 2.2, but I don't see 
> a regression in summary statistics.
> My measurements aren't confounded by [the recent regression in 
> cassandra-stress|https://issues.apache.org/jira/browse/CASSANDRA-9558]; 
> cstar_perf uses the same stress program (from trunk) on all versions on the 
> cluster.
> I'm currently working to
> - reproduce with a smaller workload so this is easier to bisect and debug.
> - get results with larger numbers of columns, since we've seen the regression 
> on 50 columns but not the stress default of 5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9619) Read performance regression in tables with many columns on trunk and 2.2 vs. 2.1

Reply via email to