[ 
https://issues.apache.org/jira/browse/CASSANDRA-9708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14624429#comment-14624429
 ] 

Benedict edited comment on CASSANDRA-9708 at 7/13/15 9:35 AM:
--------------------------------------------------------------

Yeah, that's pretty much my position. I'll note it doesn't make it a _whole_ 
lot simpler, so I don't think it's super important, since it's also pretty 
isolated.

I'm personally on the fence, if not slightly leaning in favour of retaining the 
lack of limit. But we do have a tendency to try to stop users making terrible 
decisions, and they don't come more terrible than 33+ clustering columns.

(Also from a testing POV, we can test serialization in isolation with 33+, but 
kind of difficult to do full extensive testing with that, so we may miss some 
coverage)


was (Author: benedict):
Yeah, that's pretty much my position. I'll note it doesn't make it a _whole_ 
lot simpler, so I don't think it's super important, since it's also pretty 
isolated.

I'm personally on the fence, if not slightly leaning in favour of retaining the 
lack of limit. But we do have a tendency to try to stop users making terrible 
decisions, and they don't come more terrible than 33+ clustering columns.

> Serialize ClusteringPrefixes in batches
> ---------------------------------------
>
>                 Key: CASSANDRA-9708
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9708
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>             Fix For: 3.0.0 rc1
>
>
> Typically we will have very few clustering prefixes to serialize, however in 
> theory they are not constrained (or are they, just to a very large number?). 
> Currently we encode a fat header for all values up front (two bits per 
> value), however those bits will typically be zero, and typically we will have 
> only a handful (perhaps 1 or 2) of values.
> This patch modifies the encoding to batch the prefixes in groups of up to 32, 
> along with a header that is vint encoded. Typically this will result in a 
> single byte per batch, but will consume up to 9 bytes if some of the values 
> have their flags set. If we have more than 32 columns, we just read another 
> header. This means we incur no garbage, and compress the data on disk in many 
> cases where we have more than 4 clustering components.
> I do wonder if we shouldn't impose a limit on clustering columns, though: If 
> you have more than a handful merge performance is going to disintegrate. 32 
> is probably well in excess of what we should be seeing in the wild anyway.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to