[
https://issues.apache.org/jira/browse/CASSANDRA-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646840#comment-14646840
]
Benedict edited comment on CASSANDRA-9894 at 7/29/15 10:27 PM:
---------------------------------------------------------------
I've pushed an initial version
[here|https://github.com/belliottsmith/cassandra/tree/9894]. This is based on
my patch for CASSANDRA-9471.
I tried starting from Sylvain's patch, and then starting from scratch, and
ultimately I didn't like where either lead. So this attacks the problem a
little differently: it uses the column filter sent to the replica to help
encode the response, knowing that the response columns must be a subset. With a
normal number of columns this translates to a presence bitmap (otherwise it is
a sequence of ints either adding or removing from the set, but these codepaths
should rarely be taken). If the columns are identical, a single 0 byte is sent
for all the columns.
This permits us to save work when serializing even single partitions, and also
permits us to encode per-partition encoding stats, so that our timestamps can
most likely be more efficiently encoded. It also touches far less code.
I am not 100% certain I haven't broken things, as dtests are a little tricky to
read right now, but nothing jumps out at me. I still need to introduce some
unit tests, and also want to invert the bitmap to make it more efficiently vint
encoded. But the patch is generally ready for a first round of review, as it
will change only minimally.
was (Author: benedict):
I've pushed an initial version
[here|https://github.com/belliottsmith/cassandra/tree/9894]. This is based on
my patch for CASSANDRA-9471.
I tried starting from Sylvain's patch, and then starting from scratch, and
ultimately I didn't like where either lead. So this attacks the problem a
little differently: it uses the column filter sent to the coordinator for a
query to encode the response, knowing that the columns must be a subset. With a
normal number of columns this translates to a bitmap of presence in the
response for each column in the request (otherwise it is a sequence of vint
encoded ints, but these codepaths should rarely be taken), and if the columns
are identical (what we should expect), a single 0 byte is sent for all the
columns.
This permits us to save work when serializing even single partitions, and also
permits us to encode per-partition encoding stats, so that our timestamps can
most likely be more efficiently encoded. It also touches far less code.
I am not 100% certain I haven't broken things, as dtests are a little tricky to
read right now, but nothing jumps out at me. I still need to introduce some
unit tests, and also want to invert the bitmap to make it more efficiently vint
encoded. But the patch is generally ready for a first round of review, as it
will change only minimally.
> Serialize the header only once per message
> ------------------------------------------
>
> Key: CASSANDRA-9894
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9894
> Project: Cassandra
> Issue Type: Sub-task
> Components: Core
> Reporter: Sylvain Lebresne
> Assignee: Benedict
> Fix For: 3.0 beta 1
>
>
> One last improvement I'd like to do on the serialization side is that we
> currently serialize the {{SerializationHeader}} for each partition. That
> header contains the serialized columns in particular and for range queries,
> serializing that for every partition is wasted (note that it's only a problem
> for the messaging protocol as for sstable we only write the header once per
> sstable).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)