[ https://issues.apache.org/jira/browse/CASSANDRA-6060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241364#comment-14241364 ]
Ariel Weisberg commented on CASSANDRA-6060: ------------------------------------------- I am still digging but I am not sure there is much value here. For prepared statements between client and server there are no ks/cf names. Here is the breakdown for a minimum size mutation inside the cluster Size of Ethernet frame - 24 Bytes Size of IPv4 Header (without any options) - 20 bytes Size of TCP Header (without any options) - 20 Bytes 4-bytes protocol magic 4-bytes version 4-bytes timestamp 4-bytes verb 4-bytes parameter count 4-bytes payload length prefix No keyspace name in current versions 2-byte key length key say 10 bytes 4-byte mutation count 1-byte boolean 16-byte cf id 4-byte count of columns Per column 2-byte column name length prefix column name say 8 bytes 1-byte serialization flags 8-byte timestamp 4-byte length prefix column value say 8 bytes Total is 158 bytes. Saving 12 bytes on the CF uuid would be 7.5 %. For single CF mutations this is not a win. Loading data points 16 bytes at a time isn't going to work so hot anyways so people might look into batching at that point. The UUID is not repeated for each cell so it is a one time cost so for workloads that modify multiple cells per CF. The one case where the 12-bytes becomes significant is single cell updates to multiple CFs in one mutation. There the 12-byte overhead converges on 23%. I am going to look at the read path next, but I kind of expect to find something similar. A read is going t o have key overhead and possibly overhead for all the other query parameters that should match the simple single cell mutation case. > Remove internal use of Strings for ks/cf names > ---------------------------------------------- > > Key: CASSANDRA-6060 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6060 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Jonathan Ellis > Assignee: Ariel Weisberg > Labels: performance > Fix For: 3.0 > > > We toss a lot of Strings around internally, including across the network. > Once a request has been Prepared, we ought to be able to encode these as int > ids. > Unfortuntely, we moved from int to uuid in CASSANDRA-3794, which was a > reasonable move at the time, but a uuid is a lot bigger than an int. Now > that we have CAS we can allow concurrent schema updates while still using > sequential int IDs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)