Sylvain Lebresne created CASSANDRA-6561:
-------------------------------------------

             Summary: Static columns in CQL3
                 Key: CASSANDRA-6561
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6561
             Project: Cassandra
          Issue Type: New Feature
            Reporter: Sylvain Lebresne


I'd like to suggest the following idea for adding "static" columns to CQL3.  
I'll note that the basic idea has been suggested by jhalliday on irc but the 
rest of the details are mine and I should be blamed for anything stupid in what 
follows.

Let me start with a rational: there is 2 main family of CF that have been 
historically used in Thrift: static ones and dynamic ones. CQL3 handles both 
family through the presence or not of clustering columns. There is however some 
cases where mixing both behavior has its use. I like to think of those use 
cases as 3 broad category:
# to denormalize small amounts of not-entirely-static data in otherwise static 
entities. It's say "tags" for a product or "custom properties" in a user 
profile. This is why we've added CQL3 collections. Importantly, this is the 
*only* use case for which collections are meant (which doesn't diminishes their 
usefulness imo, and I wouldn't disagree that we've maybe not communicated this 
too well).
# to optimize fetching both a static entity and related dynamic ones. Say you 
have blog posts, and each post has associated comments (chronologically 
ordered). *And* say that a very common query is "fetch a post and its 50 last 
comments". In that case, it *might* be beneficial to store a blog post (static 
entity) in the same underlying CF than it's comments for performance reason.  
So that "fetch a post and it's 50 last comments" is just one slice internally.
# you want to CAS rows of a dynamic partition based on some partition 
condition. This is the same use case than why CASSANDRA-5633 exists for.

As said above, 1) is already covered by collections, but 2) and 3) are not (and
I strongly believe collections are not the right fit, API wise, for those).

Also, note that I don't want to underestimate the usefulness of 2). In most 
cases, using a separate table for the blog posts and the comments is The Right 
Solution, and trying to do 2) is premature optimisation. Yet, when used 
properly, that kind of optimisation can make a difference, so I think having a 
relatively native solution for it in CQL3 could make sense.

Regarding 3), though CASSANDRA-5633 would provide one solution for it, I have 
the feeling that static columns actually are a more natural approach (in term 
of API). That's arguably more of a personal opinion/feeling though.

So long story short, CQL3 lacks a way to mix both some "static" and "dynamic" 
rows in the same partition of the same CQL3 table, and I think such a tool 
could have it's use.

The proposal is thus to allow "static" columns. Static columns would only make 
sense in table with clustering columns (the "dynamic" ones). A static column 
value would be static to the partition (all rows of the partition would share 
the value for such column). The syntax would just be:
{noformat}
CREATE TABLE t (
  k text,
  s text static,
  i int,
  v text,
  PRIMARY KEY (k, i)
)
{noformat}
then you'd get:
{noformat}
INSERT INTO t(k, s, i, v) VALUES ("k0", "I'm shared",       0, "foo");
INSERT INTO t(k, s, i, v) VALUES ("k0", "I'm still shared", 1, "bar");
SELECT * FROM t;
 k |                  s | i |    v
------------------------------------
k0 | "I'm still shared" | 0 | "bar"
k0 | "I'm still shared" | 1 | "foo"
{noformat}
There would be a few semantic details to decide on regarding deletions, ttl, 
etc. but let's see if we agree it's a good idea first before ironing those out.

One last point is the implementation. Though I do think this idea has merits, 
it's definitively not useful enough to justify rewriting the storage engine for 
it. But I think we can support this relatively easily (emphasis on "relatively" 
:)), which is probably the main reason why I like the approach.

Namely, internally, we can store static columns as cells whose clustering 
column values are empty. So in terms of cells, the partition of my example 
would look like:
{noformat}
"k0" : [
  (:"s" -> "I'm still shared"), // the static column
  (0:"" -> "")                  // row marker
  (0:"v" -> "bar")
  (1:"" -> "")                  // row marker
  (1:"v" -> "foo")
]
{noformat}
Of course, using empty values for the clustering columns doesn't quite work 
because it could conflict with the user using empty clustering columns. But in 
the CompositeType encoding we have the end-of-component byte that we could 
reuse by using a specific value (say 0xFF, currently we never set that byte to 
anything else than -1, 0 and 1) to indicate it's a static column.

With that, we'd need to update the CQL3 statements to support the new syntax 
and rules, but that's probably not horribly hard.

So anyway, this may or may not be a good idea, but I think it has enough meat 
to warrant some consideration.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to