Jim Witschey created CASSANDRA-9619:
---------------------------------------

             Summary: Read performance regression on trunk and 2.2 vs. 2.1
                 Key: CASSANDRA-9619
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9619
             Project: Cassandra
          Issue Type: Bug
            Reporter: Jim Witschey


There seems to be a regression in read in 2.2 and trunk, as compared to 2.1 and 
2.0. I found it running cstar_perf jobs with 50-column tables. 2.2 may be worse 
than trunk, though my results on that aren't consistent. The relevant 
cstar_perf jobs are here:

http://cstar.datastax.com/tests/id/273e2ea8-0fc8-11e5-816c-42010af0688f

http://cstar.datastax.com/tests/id/3a8002d6-1480-11e5-97ff-42010af0688f

http://cstar.datastax.com/tests/id/40ff2766-1248-11e5-bac8-42010af0688f

The sequence of commands for these jobs is

{code}
stress write n=65000000 -rate threads=300 -col n=FIXED\(50\)
stress read n=65000000 -rate threads=300
stress read n=65000000 -rate threads=300
{code}

Have a look at the operations per second going from [the first read 
operation|http://cstar.datastax.com/graph?stats=273e2ea8-0fc8-11e5-816c-42010af0688f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=729.08&ymin=0&ymax=174379.7]
 to [the second read 
operation|http://cstar.datastax.com/graph?stats=273e2ea8-0fc8-11e5-816c-42010af0688f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=729.08&ymin=0&ymax=174379.7].
 They've fallen from ~135K to ~100K comparing trunk to 2.1 and 2.0. It's 
slightly worse for 2.2, and 2.2 operations per second fall continuously from 
the first to the second read operation.

There's a corresponding increase in read latency -- it's noticable on trunk and 
pretty bad on 2.2. Again, the latency gets higher and higher on 2.2 as the read 
operations progress (see the graphs 
[here|http://cstar.datastax.com/graph?stats=273e2ea8-0fc8-11e5-816c-42010af0688f&metric=95th_latency&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=729.08&ymin=0&ymax=17.27]
 and 
[here|http://cstar.datastax.com/graph?stats=273e2ea8-0fc8-11e5-816c-42010af0688f&metric=95th_latency&operation=3_read&smoothing=1&show_aggregates=true&xmin=0&xmax=928.62&ymin=0&ymax=14.52]).

I see a similar regression in a [more recent 
test|http://cstar.datastax.com/graph?stats=40ff2766-1248-11e5-bac8-42010af0688f&metric=op_rate&operation=2_read&smoothing=1&show_aggregates=true&xmin=0&xmax=752.62&ymin=0&ymax=171799.1],
 though in this one trunk performed worse than 2.2. This run also didn't 
display the increasing latency in 2.2.

This regression may show for smaller numbers of columns, but not as 
prominently, as shown [in the results to this test with the stress default of 5 
columns|http://cstar.datastax.com/graph?stats=227cb89e-0fc8-11e5-9f14-42010af0688f&metric=99.9th_latency&operation=3_read&smoothing=1&show_aggregates=true&xmin=0&xmax=498.19&ymin=0&ymax=334.29].
 There's an increase in latency variability on trunk and 2.2, but I don't see a 
regression in summary statistics.

My measurements aren't confounded by [the recent regression in 
cassandra-stress|https://issues.apache.org/jira/browse/CASSANDRA-9558]; 
cstar_perf uses the same stress program (from trunk) on all versions on the 
cluster.

I'm currently working to

- reproduce with a smaller workload so this is easier to bisect and debug.
- get results with larger numbers of columns, since we've seen the regression 
on 50 columns but not the stress default of 5.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to