[jira] [Commented] (CASSANDRA-7209) Consider changing UDT serialization format before 2.1 release.

Tyler Hobbs (JIRA) Fri, 16 May 2014 17:45:13 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000412#comment-14000412
 ]


Tyler Hobbs commented on CASSANDRA-7209:
----------------------------------------

bq. I understand that this might very likely cause inconvenience on C* or 
drivers side, but technically this information is redundant and is overhead.

So far we've always included enough information in the result set metadata to 
allow drivers to fully decode (and present) results without any other schema 
knowledge.  If we expect drivers to present UDTs as dicts or objects with 
fieldnames, they need to be in the result set metadata.

If you're worried about overhead, drivers can always use the skip_metadata flag 
when using prepared statements.

> Consider changing UDT serialization format before 2.1 release.
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-7209
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7209
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 2.1 rc1
>
>         Attachments: 0001-7209.txt, 
> 0002-Rename-column_names-types-to-field_names-types.txt
>
>
> The current serialization format of UDT is the one of CompositeType. This was 
> initially done on purpose, so that users that were using CompositeType for 
> values in their thrift schema could migrate smoothly to UDT (it was also 
> convenient code wise but that's a weak point).
> I'm having serious doubt about this being wise however for 2 reasons:
> * for each component, CompositeType stores an addition byte (the 
> end-of-component) for reasons that only pertain to querying. This byte is 
> basically wasted for UDT and makes no sense. I'll note that outside the 
> inefficiency, there is also the fact that it will likely be pretty 
> surprising/error-prone for driver authors.
> * it uses an unsigned short for the length of each component. While it's 
> certainly not advisable in the current implementation to use values too big 
> inside an UDT, having this limitation hard-coded in the serialization format 
> is wrong and we've been bitten by this with collection already which we've 
> had to fix in the protocol v3. It's probably worth no doing that mistake 
> again. Furthermore, if we use an int for the size, we can use a negative size 
> to represent a null value (the main point being that it's consistent with how 
> we serialize values in the native protocol), which can be useful 
> (CASSANDRA-7206).
> Of course, if we change that serialization format, we'd better do it before 
> the 2.1 release. But I think the advantages outweigh the cons especially in 
> the long run so I think we should do it. I'll try to work out a patch quickly 
> so if you have a problem with the principle of this issue, it would be nice 
> to voice it quickly.
> I'll note that doing that change will mean existing CompositeType values 
> won't be able to be migrated transparently to UDT. I think this was anecdotal 
> in the first place at best, I don't think using CompositeType for values is 
> that popular in thrift tbh. Besides, if we really really want to, it might 
> not be too hard to re-introduce that compatibility later by having some 
> protocol level trick. We can't change the serialization format without 
> breaking people however.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-7209) Consider changing UDT serialization format before 2.1 release.

Reply via email to