[ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479294#comment-13479294
 ] 

Jonathan Ellis edited comment on CASSANDRA-4815 at 10/18/12 7:41 PM:
---------------------------------------------------------------------

Hi Ed,

I wrote a longish blog post over at 
http://www.datastax.com/dev/blog/cql3-for-cassandra-experts showing how use 
cases like this are handled in CQL3, with no rewriting of data.  Give that a 
read and let me know if you have further questions!

(All the examples in that post, except for the one using {{Set}}, are from 
Cassandra 1.1.6 and are forwards-compatible with 1.2.)
                
      was (Author: jbellis):
    Hi Ed,

I wrote a longish blog post over at 
http://www.datastax.com/dev/blog/cql3-for-cassandra-experts showing how use 
cases like this are handled in CQL3, with no rewriting of data.  Give that a 
read and let me know if you have further questions!

(All the examples in that post, except for the one using {{Set}}, are from 
Cassandra 1.1.6.)
                  
> Make CQL work naturally with wide rows
> --------------------------------------
>
>                 Key: CASSANDRA-4815
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
>             Project: Cassandra
>          Issue Type: Wish
>            Reporter: Edward Capriolo
>
> I find that CQL3 is quite obtuse and does not provide me a language useful 
> for accessing my data. First, lets point out how we should design Cassandra 
> data. 
> 1) Denormalize
> 2) Eliminate seeks
> 3) Design for read
> 4) optimize for blind writes
> So here is a schema that abides by these tried and tested rules large 
> production uses are employing today. 
> Say we have a table of movie objects:
> Movie
> Name
> Description
> -< tags   (string)
> -< credits composite(role string, name string )
> -1 likesToday
> -1 blacklisted
> The above structure is a movie notice it hold a mix of static and dynamic 
> columns, but the other all number of columns is not very large. (even if it 
> was larger this is OK as well) Notice this table is not just 
> a single one to many relationship, it has 1 to 1 data and it has two sets of 
> 1 to many data.
> The schema today is declared something like this:
> create column family movies
> with default_comparator=UTF8Type and
>   column_metadata =
>   [
>     {column_name: blacklisted, validation_class: int},
>     {column_name: likestoday, validation_class: long},
>     {column_name: description, validation_class: UTF8Type}
>   ];
> We should be able to insert data like this:
> set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
> set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
> set ['Cassandra Database, not looking for a 
> seQL']['credits-dir']='director:asf';
> set ['Cassandra Database, not looking for a 
> seQL']['credits-jir]='jiraguy:bob';
> set ['Cassandra Database, not looking for a seQL']['tags-action']='';
> set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
> set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
> set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
> This is the correct way to do it. 1 seek to find all the information related 
> to a movie. As long as this row does
> not get "large" there is no reason to optimize by breaking data into other 
> column families. (Notice you can not transpose this
> because movies is two 1-to-many relationships of potentially different types)
> Lets look at the CQL3 way to do this design:
> First, contrary to the original design of cassandra CQL does not like wide 
> rows. It also does not have a good way to dealing with dynamic rows together 
> with static rows either.
> You have two options:
> Option 1: lose all schema
> create table movies ( name string, column blob, value blob, primary 
> key(name)) with compact storage.
> This method is not so hot we have not lost all our validators, and by the way 
> you have to physically shutdown everything and rename files and recreate your 
> schema if you want to inform cassandra that a current table should be 
> compact. This could at very least be just a metadata change. Also you can not 
> add column schema either.
> Option 2  Normalize (is even worse)
> create table movie (name String, description string, likestoday int, 
> blacklisted int);
> create table movecredits( name string, role string, personname string, 
> primary key(name,role) );
> create table movetags( name string, tag string, primary key (name,tag) );
> This is a terrible design, of the 4 key characteristics how cassandra data 
> should be designed it fails 3:
> It does not:
> 1) Denormalize
> 2) Eliminate seeks
> 3) Design for read
> Why is Cassandra steering toward this course, by making a language that does 
> not understand wide rows?
> So what can be done? My suggestions: 
> Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
> "virtual view" that is compact storage with no work to migrate data and 
> recreate schemas. Every table should have a compact view for the schemaless, 
> or a simple query hint like /*transposed*/ should make this change.
> Metadata should be definable by regex. For example, all columnes named "tag*" 
> are of type string.
> CQL should have the column[slice_start] .. column[slice_end] operator from 
> cql2. 
> CQL should support current users, users should not have to 
> switch between CQL versions, and possibly thrift, to work with wide rows. The 
> language should work for them even if 
> it not expressly designed for them. Some of these features are already part 
> of cql2 so they should be carried over.
> Also what needs to not happen is someone to make a hand waiving statement 
> like "Once we have collection types we will not need wide rows". This request 
> is to satisfy current users of cassandra not future ones or theoretical ones. 
> Solutions should not involve physically migrating data in any way, they 
> should not involve telling someone to do something they are already doing 
> much differently. The suggestions should revolve around making the query 
> language work well with existing data. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to