[ 
https://issues.apache.org/jira/browse/CASSANDRA-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096524#comment-15096524
 ] 

Sylvain Lebresne edited comment on CASSANDRA-10657 at 1/15/16 10:12 AM:
------------------------------------------------------------------------

Pushing a patch for this with 4 commits:
# the first commit ensures cells with skipped values are properly handled (that 
is, ignored) by {{PartitionUpdate.fromIterator}}.
# the second commit re-enable value skipping and skip them also in read-repair.
# the thrift commit implement the addition optimization I discuss above: we 
completely ignore cells when we know we can. It also include a bit of 
refactoring; the naming and explanations around this value skipping weren't 
terribly good, especially after that 3rd commit, so this make things hopefully 
cleaner. I apologize for not having those 2 parts separated (optimization and 
refactoring), they were but I screwed up my history at some point.

|| [patch|https://github.com/pcmanus/cassandra/commits/10657] || 
[utests|http://cassci.datastax.com/view/Dev/view/pcmanus/job/pcmanus-10657-testall/11/]
 || 
[dtests|http://cassci.datastax.com/view/Dev/view/pcmanus/job/pcmanus-10657-dtest/6/]
 ||

I've also run some simple number to quantify how this help. The test is pretty 
simple: each row has 4 columns, 2 simple int ones and 2 other with fixed 100K 
values, and the test only query the 2 small ones. The results are there:
http://cstar.datastax.com/graph?command=one_job&stats=fade71da-ba01-11e5-8c22-0256e416528f&metric=op_rate&operation=2_user&smoothing=1&show_aggregates=true&xmin=0&xmax=55.44&ymin=0&ymax=29004.8
On that specific test, the version with that patch is ~17% faster than trunk.



was (Author: slebresne):
Pushing a patch for this with 4 commits:
# the first commit ensures cells with skipped values are properly handled (that 
is, ignored) by {{PartitionUpdate.fromIterator}}.
# the second commit re-enable value skipping and skip them also in read-repair.
# the thrift commit implement the addition optimization I discuss above: we 
completely ignore cells when we know we can. It also include a bit of 
refactoring; the naming and explanations around this value skipping weren't 
terribly good, especially after that 3rd commit, so this make things hopefully 
cleaner. I apologize for not having those 2 parts separated (optimization and 
refactoring), they were but I screwed up my history at some point.

|| [patch|https://github.com/pcmanus/cassandra/commits/10657] || 
[utests|http://cassci.datastax.com/view/Dev/view/pcmanus/job/pcmanus-10657-testall/10/]
 || 
[dtests|http://cassci.datastax.com/view/Dev/view/pcmanus/job/pcmanus-10657-dtest/4/]
 ||

I've also run some simple number to quantify how this help. The test is pretty 
simple: each row has 4 columns, 2 simple int ones and 2 other with fixed 100K 
values, and the test only query the 2 small ones. The results are there:
http://cstar.datastax.com/graph?command=one_job&stats=fade71da-ba01-11e5-8c22-0256e416528f&metric=op_rate&operation=2_user&smoothing=1&show_aggregates=true&xmin=0&xmax=55.44&ymin=0&ymax=29004.8
On that specific test, the version with that patch is ~17% faster than trunk.


> Re-enable/improve value skipping
> --------------------------------
>
>                 Key: CASSANDRA-10657
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10657
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 3.x
>
>
> This is a followup to CASSANDRA-10655, to re-enable the optimization of 
> skipping values for the columns that are not requested by users in a CQL 
> query. See CASSANDRA-10655 for why it was disabled, the goal here is to 
> re-enable it minus the bugs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to