[jira] [Updated] (CASSANDRA-7209) Consider changing UDT serialization format before 2.1 release.

Sylvain Lebresne (JIRA) Thu, 15 May 2014 09:41:38 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sylvain Lebresne updated CASSANDRA-7209:
----------------------------------------

    Attachment: 0002-Rename-column_names-types-to-field_names-types.txt
                0001-7209.txt

Patch attached for this. The "new" format simply use 4 bytes for value sizes 
instead of 2 and drop the EOC byte. It's basically what makes sense for the 
native protocol given other encodings. The patch does a tiny bit of renaming 
too (columnNames->fieldNames and types->fieldTypes) because it's cleaner that 
way I think. I include a 2nd patch that also rename column_names/types to 
field_names/types in the schema table while at it for coherence with the code.

> Consider changing UDT serialization format before 2.1 release.
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-7209
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7209
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 2.1 rc1
>
>         Attachments: 0001-7209.txt, 
> 0002-Rename-column_names-types-to-field_names-types.txt
>
>
> The current serialization format of UDT is the one of CompositeType. This was 
> initially done on purpose, so that users that were using CompositeType for 
> values in their thrift schema could migrate smoothly to UDT (it was also 
> convenient code wise but that's a weak point).
> I'm having serious doubt about this being wise however for 2 reasons:
> * for each component, CompositeType stores an addition byte (the 
> end-of-component) for reasons that only pertain to querying. This byte is 
> basically wasted for UDT and makes no sense. I'll note that outside the 
> inefficiency, there is also the fact that it will likely be pretty 
> surprising/error-prone for driver authors.
> * it uses an unsigned short for the length of each component. While it's 
> certainly not advisable in the current implementation to use values too big 
> inside an UDT, having this limitation hard-coded in the serialization format 
> is wrong and we've been bitten by this with collection already which we've 
> had to fix in the protocol v3. It's probably worth no doing that mistake 
> again. Furthermore, if we use an int for the size, we can use a negative size 
> to represent a null value (the main point being that it's consistent with how 
> we serialize values in the native protocol), which can be useful 
> (CASSANDRA-7206).
> Of course, if we change that serialization format, we'd better do it before 
> the 2.1 release. But I think the advantages outweigh the cons especially in 
> the long run so I think we should do it. I'll try to work out a patch quickly 
> so if you have a problem with the principle of this issue, it would be nice 
> to voice it quickly.
> I'll note that doing that change will mean existing CompositeType values 
> won't be able to be migrated transparently to UDT. I think this was anecdotal 
> in the first place at best, I don't think using CompositeType for values is 
> that popular in thrift tbh. Besides, if we really really want to, it might 
> not be too hard to re-introduce that compatibility later by having some 
> protocol level trick. We can't change the serialization format without 
> breaking people however.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (CASSANDRA-7209) Consider changing UDT serialization format before 2.1 release.

Reply via email to