[
https://issues.apache.org/jira/browse/CASSANDRA-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15096524#comment-15096524
]
Sylvain Lebresne edited comment on CASSANDRA-10657 at 1/15/16 10:12 AM:
------------------------------------------------------------------------
Pushing a patch for this with 4 commits:
# the first commit ensures cells with skipped values are properly handled (that
is, ignored) by {{PartitionUpdate.fromIterator}}.
# the second commit re-enable value skipping and skip them also in read-repair.
# the thrift commit implement the addition optimization I discuss above: we
completely ignore cells when we know we can. It also include a bit of
refactoring; the naming and explanations around this value skipping weren't
terribly good, especially after that 3rd commit, so this make things hopefully
cleaner. I apologize for not having those 2 parts separated (optimization and
refactoring), they were but I screwed up my history at some point.
|| [patch|https://github.com/pcmanus/cassandra/commits/10657] ||
[utests|http://cassci.datastax.com/view/Dev/view/pcmanus/job/pcmanus-10657-testall/11/]
||
[dtests|http://cassci.datastax.com/view/Dev/view/pcmanus/job/pcmanus-10657-dtest/6/]
||
I've also run some simple number to quantify how this help. The test is pretty
simple: each row has 4 columns, 2 simple int ones and 2 other with fixed 100K
values, and the test only query the 2 small ones. The results are there:
http://cstar.datastax.com/graph?command=one_job&stats=fade71da-ba01-11e5-8c22-0256e416528f&metric=op_rate&operation=2_user&smoothing=1&show_aggregates=true&xmin=0&xmax=55.44&ymin=0&ymax=29004.8
On that specific test, the version with that patch is ~17% faster than trunk.
was (Author: slebresne):
Pushing a patch for this with 4 commits:
# the first commit ensures cells with skipped values are properly handled (that
is, ignored) by {{PartitionUpdate.fromIterator}}.
# the second commit re-enable value skipping and skip them also in read-repair.
# the thrift commit implement the addition optimization I discuss above: we
completely ignore cells when we know we can. It also include a bit of
refactoring; the naming and explanations around this value skipping weren't
terribly good, especially after that 3rd commit, so this make things hopefully
cleaner. I apologize for not having those 2 parts separated (optimization and
refactoring), they were but I screwed up my history at some point.
|| [patch|https://github.com/pcmanus/cassandra/commits/10657] ||
[utests|http://cassci.datastax.com/view/Dev/view/pcmanus/job/pcmanus-10657-testall/10/]
||
[dtests|http://cassci.datastax.com/view/Dev/view/pcmanus/job/pcmanus-10657-dtest/4/]
||
I've also run some simple number to quantify how this help. The test is pretty
simple: each row has 4 columns, 2 simple int ones and 2 other with fixed 100K
values, and the test only query the 2 small ones. The results are there:
http://cstar.datastax.com/graph?command=one_job&stats=fade71da-ba01-11e5-8c22-0256e416528f&metric=op_rate&operation=2_user&smoothing=1&show_aggregates=true&xmin=0&xmax=55.44&ymin=0&ymax=29004.8
On that specific test, the version with that patch is ~17% faster than trunk.
> Re-enable/improve value skipping
> --------------------------------
>
> Key: CASSANDRA-10657
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10657
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Sylvain Lebresne
> Assignee: Sylvain Lebresne
> Fix For: 3.x
>
>
> This is a followup to CASSANDRA-10655, to re-enable the optimization of
> skipping values for the columns that are not requested by users in a CQL
> query. See CASSANDRA-10655 for why it was disabled, the goal here is to
> re-enable it minus the bugs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)