[
https://issues.apache.org/jira/browse/CASSANDRA-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626326#comment-14626326
]
Ariel Weisberg commented on CASSANDRA-9708:
-------------------------------------------
OK looks good to me. Maybe we should just go simple for the < 9 bytes remaining
if it isn't common and make the copy so the if ladder is less noisy?
Also maybe worth mentioning in the constructor it is possible that a copy will
be made so changes to the underlying buffer won't be reflected in the stream.
> Serialize ClusteringPrefixes in batches
> ---------------------------------------
>
> Key: CASSANDRA-9708
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9708
> Project: Cassandra
> Issue Type: Sub-task
> Components: Core
> Reporter: Benedict
> Assignee: Benedict
> Fix For: 3.0.0 rc1
>
>
> Typically we will have very few clustering prefixes to serialize, however in
> theory they are not constrained (or are they, just to a very large number?).
> Currently we encode a fat header for all values up front (two bits per
> value), however those bits will typically be zero, and typically we will have
> only a handful (perhaps 1 or 2) of values.
> This patch modifies the encoding to batch the prefixes in groups of up to 32,
> along with a header that is vint encoded. Typically this will result in a
> single byte per batch, but will consume up to 9 bytes if some of the values
> have their flags set. If we have more than 32 columns, we just read another
> header. This means we incur no garbage, and compress the data on disk in many
> cases where we have more than 4 clustering components.
> I do wonder if we shouldn't impose a limit on clustering columns, though: If
> you have more than a handful merge performance is going to disintegrate. 32
> is probably well in excess of what we should be seeing in the wild anyway.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)