[ 
https://issues.apache.org/jira/browse/CASSANDRA-7209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14003607#comment-14003607
 ] 

Sylvain Lebresne commented on CASSANDRA-7209:
---------------------------------------------

Committed with doc typos, thanks.

bq.  One nit, as mentioned before, is that we don't need to serialize the 
keyspace name

That's true. It's not entirely trivial code-wise to remove it because we need 
it in DataType#readValue to create a UserType (the AbstractType class) from 
what we got from the protocol definition and we don't have the context easily 
available. We could probably make such context available somehow but it's 
likely painful to refactor. So, since as said by Tyler this is not hugely 
relevant to performance since we only send that a preparation time (thanks to 
the skip_metadata), I'd vote for letting it be. Besides, one advantage of 
leaving it in is that if we ever change our mind about limiting user type to 
their keyspace of definition, we won't be stuck because of that. 

> Consider changing UDT serialization format before 2.1 release.
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-7209
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7209
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 2.1 rc1
>
>         Attachments: 0001-7209.txt, 
> 0002-Rename-column_names-types-to-field_names-types.txt, doc-typos.txt
>
>
> The current serialization format of UDT is the one of CompositeType. This was 
> initially done on purpose, so that users that were using CompositeType for 
> values in their thrift schema could migrate smoothly to UDT (it was also 
> convenient code wise but that's a weak point).
> I'm having serious doubt about this being wise however for 2 reasons:
> * for each component, CompositeType stores an addition byte (the 
> end-of-component) for reasons that only pertain to querying. This byte is 
> basically wasted for UDT and makes no sense. I'll note that outside the 
> inefficiency, there is also the fact that it will likely be pretty 
> surprising/error-prone for driver authors.
> * it uses an unsigned short for the length of each component. While it's 
> certainly not advisable in the current implementation to use values too big 
> inside an UDT, having this limitation hard-coded in the serialization format 
> is wrong and we've been bitten by this with collection already which we've 
> had to fix in the protocol v3. It's probably worth no doing that mistake 
> again. Furthermore, if we use an int for the size, we can use a negative size 
> to represent a null value (the main point being that it's consistent with how 
> we serialize values in the native protocol), which can be useful 
> (CASSANDRA-7206).
> Of course, if we change that serialization format, we'd better do it before 
> the 2.1 release. But I think the advantages outweigh the cons especially in 
> the long run so I think we should do it. I'll try to work out a patch quickly 
> so if you have a problem with the principle of this issue, it would be nice 
> to voice it quickly.
> I'll note that doing that change will mean existing CompositeType values 
> won't be able to be migrated transparently to UDT. I think this was anecdotal 
> in the first place at best, I don't think using CompositeType for values is 
> that popular in thrift tbh. Besides, if we really really want to, it might 
> not be too hard to re-introduce that compatibility later by having some 
> protocol level trick. We can't change the serialization format without 
> breaking people however.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to