[ 
https://issues.apache.org/jira/browse/CASSANDRA-5417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626893#comment-13626893
 ] 

Sylvain Lebresne commented on CASSANDRA-5417:
---------------------------------------------

bq. it is adding to the overall mess that we have from supporting all these 
different implementations

It was my hope that the code was at least encapsulating thing more cleanly. 

bq. The storage engine is too tied to the public APIs

I don't disagree, but at the same time we've made promises that we wouldn't 
break most of those public API.

bq. Things like CType.asAbstractType is obviously a problem and it makes me 
worry about the maintainability of this kind of change

Fair enough. But for what it's worth, things like asAbstractType don't seem 
like a big deal to me. There is a few shortcuts like that that the patch use to 
avoid rewriting all of Cassandra in one patch, but imo they are used in 
relative clear boundaries (i.e. between thrift/cql2 and the internal storage). 
At least, that was the intent :)

bq. If anything this patch doesn't go far enough since I don't see where/how 
this gets better

To be clear, I don't intend this patch to be an end. There is definitively more 
things that can be cleaned up and we can absolutely push things further. But at 
the same time, I'm not sure "rewriting all the things" in one go is going to be 
realistic.

I do think that the impedance mismatch between CQL3 and the storage engine is 
getting in our way, and I do think we should do something about it. This patch 
is a suggestion, which is trying to take an incrementalish approach. I'm open 
to other solution/modification though. But I'll also note that "cleaning up 
CQL3 code" is not the only reason for this ticket, encapsulating the composite 
nature of column names so we can optimize things later is another (and here 
again, I realize that the patch itself does not add the optimizations per se, 
but I was just not convinced that sealing myself in a cave for 2 months to 
implement all the optimization I can think of that make use of this ticket 
would be the best strategy).


                
> Push composites support in the storage engine
> ---------------------------------------------
>
>                 Key: CASSANDRA-5417
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5417
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Assignee: Sylvain Lebresne
>             Fix For: 2.0
>
>
> CompositeType happens to be very useful and is now widely used: CQL3 heavily 
> rely on it, and super columns are now using it too internally. Besides, 
> CompositeType has been advised as a replacement of super columns on the 
> thrift side for a while, so it's safe to assume that it's generally used 
> there too.
> CompositeType has initially been introduced as just another AbstractType.  
> Meaning that the storage engine has no nothing whatsoever of composites 
> being, well, composite. This has the following drawbacks:
> * Because internally a composite value is handled as just a ByteBuffer, we 
> end up doing a lot of extra work. Typically, each time we compare 2 composite 
> value, we end up "deserializing" the components (which, while it doesn't copy 
> data per-se because we just slice the global ByteBuffer, still waste some cpu 
> cycles and allocate a bunch of ByteBuffer objects). And since compare can be 
> called *a lot*, this is likely not negligible.
> * This make CQL3 code uglier than necessary. Basically, CQL3 makes extensive 
> use of composites, and since it gets backs ByteBuffer from the internal 
> columns, it always have to check if it's actually a compositeType or not, and 
> then split it and pick the different parts it needs. It's only an API 
> problem, but having things exposed as composites directly would definitively 
> make thinks cleaner. In particular, in most cases, CQL3 don't care whether it 
> has a composite with only one component or a non-really-composite value, but 
> we still always distinguishes both cases.  Lastly, if we do expose composites 
> more directly internally, it's not a lot more work to "internalize" better 
> the different parts of the cell name that CQL3 uses (what's the clustering 
> key, what's the actuall CQL3 column name, what's the collection element), 
> making things cleaner. Last but not least, there is currently a bunch of 
> places where methods take a ByteBuffer as argument and it's hard to know 
> whether it expects a cell name or a CQL3 column name. This is pretty error 
> prone.
> * It makes it hard (or impossible) to do a number of performance 
> improvements.  Consider CASSANDRA-4175, I'm not really sure how you can do it 
> properly (in memory) if cell names are just ByteBuffer (since CQL3 column 
> names are just one of the component in general). But we also miss 
> oportunities of sharing prefixes. If we were able to share prefixes of 
> composite names in memory we would 1) lower the memory footprint and 2) 
> potentially speed-up comparison (of the prefixes) by checking reference 
> equality first (also, doing prefix sharing on-disk, which is a separate 
> concern btw, might be easier to do if we do prefix sharing in memory).
> So I suggest pushing CompositeType support inside the storage engine. What I 
> mean by that concretely would be change the internal {{Column.name}} from 
> ByteBuffer to some CellName type. A CellName would API-wise just be a list of 
> ByteBuffer. But in practice, we'd have a specific CellName implementation for 
> not-really-composite names, and the truly composite implementation will allow 
> some prefix sharing. From an external API however, nothing would change, we 
> would pack the composite as usual before sending it back to the client, but 
> at least internally, comparison won't have to deserialize the components 
> every time, and CQL3 code will be cleaner.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to