[jira] [Commented] (CASSANDRA-6060) Remove internal use of Strings for ks/cf names

Ariel Weisberg (JIRA) Wed, 10 Dec 2014 08:49:39 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241364#comment-14241364
 ]


Ariel Weisberg commented on CASSANDRA-6060:
-------------------------------------------

I am still digging but I am not sure there is much value here.

For prepared statements between client and server there are no ks/cf names.

Here is the breakdown for a minimum size mutation inside the cluster

Size of Ethernet frame - 24 Bytes
Size of IPv4 Header (without any options) - 20 bytes
Size of TCP Header (without any options) - 20 Bytes

4-bytes protocol magic
4-bytes version
4-bytes timestamp
4-bytes verb
4-bytes parameter count
4-bytes payload length prefix
No keyspace name in current versions
2-byte key length
key say 10 bytes
4-byte mutation count

1-byte boolean
16-byte cf id
4-byte count of columns

Per column
2-byte column name length prefix
column name say 8 bytes
1-byte serialization flags
8-byte timestamp
4-byte length prefix
column value say 8 bytes

Total is 158 bytes. Saving 12 bytes on the CF uuid would be 7.5 %. 

For single CF mutations this is not a win. Loading data points 16 bytes at a 
time isn't going to work so hot anyways so people might look into batching at 
that point.

The UUID is not repeated for each cell so it is a one time cost so for 
workloads that modify multiple cells per CF. The one case where the 12-bytes 
becomes significant is single cell updates to multiple CFs in one mutation. 
There the 12-byte overhead converges on 23%.

I am going to look at the read path next, but I kind of expect to find 
something similar. A read is going t o have key overhead and possibly overhead 
for all the other query parameters that should match the simple single cell 
mutation case.

> Remove internal use of Strings for ks/cf names
> ----------------------------------------------
>
>                 Key: CASSANDRA-6060
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6060
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Ariel Weisberg
>              Labels: performance
>             Fix For: 3.0
>
>
> We toss a lot of Strings around internally, including across the network.  
> Once a request has been Prepared, we ought to be able to encode these as int 
> ids.
> Unfortuntely, we moved from int to uuid in CASSANDRA-3794, which was a 
> reasonable move at the time, but a uuid is a lot bigger than an int.  Now 
> that we have CAS we can allow concurrent schema updates while still using 
> sequential int IDs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-6060) Remove internal use of Strings for ks/cf names

Reply via email to