[
https://issues.apache.org/jira/browse/CASSANDRA-8959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14358267#comment-14358267
]
Sylvain Lebresne commented on CASSANDRA-8959:
---------------------------------------------
For the record, this should also be extended to collections.
I'll note that there is 2 subparts to this: the internal encoding, and the one
we send to clients. It's technically possible to not have the same encoding for
both and translate when receiving/sending to clients, but what is inefficient
internally is also inefficient on the native protocol so I'd suggest we switch
to the same more efficient encoding for both (but for existing version of the
native protocol, this does mean we'll have to translate to the old format,
which is ok).
> More efficient frozen UDT and tuple serialization format
> --------------------------------------------------------
>
> Key: CASSANDRA-8959
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8959
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Aleksey Yeschenko
> Labels: performance
> Fix For: 3.1
>
>
> The current serialization format for UDTs has a fixed overhead of 4 bytes per
> defined field (encoding the size of the field).
> It is inefficient for sparse UDTs - ones with many defined fields, but few of
> them present. We could keep a bitset to indicate the missing fields, if any.
> It's sub-optimal for encoding UDTs with all the values present as well. We
> could use varint encoding for the field sizes of blob/text fields and encode
> 'fixed' sized types directly, without the 4-bytes size prologue.
> That or something more brilliant. Any improvement right now is lhf.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)