[jira] [Commented] (CASSANDRA-6588) Add a 'NO EMPTY RESULTS' filter to SELECT

Sylvain Lebresne (JIRA) Fri, 17 Jan 2014 10:08:41 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875033#comment-13875033
 ]


Sylvain Lebresne commented on CASSANDRA-6588:
---------------------------------------------

I'm a rather strong -1 on removing per-column TTL. I've personally used them (I 
do mean in cases where per-row would be too coarse grained), I've witnessed at 
least a few other uses over the years and per-column TTLs have been here for a 
long enough that I have no doubt the cases I've personally witness are not the 
only exceptions. I'm not saying everyone that uses TTL needs per-column ones, 
but there *is* people using per-column TTL, which means this would be a 
breaking change that would break some people. The bar to do such breaking 
changes should be high: it should be very clear that the benefits of doing such 
change strongly overweight the pain inflicted to users we break.  This issue is 
nowhere near close to meeting that bar imo.

Even if we were to ignore the breaking aspects, since there is cases where you 
do want to put TTL only on some of the columns of a row, if we remove 
per-column TTL, it means users will be forced to split such rows into multiple 
table to be able to apply different TTL on some columns and not others. And 
avoiding the need to split table in two due to limitations of CQL rows is 
exactly the reason why this ticket exists. Removing per-column TTL to solve 
this wouldn't be progress.

Last but not necessary least, I'm far from convinced that per-row TTL would 
actually solve this. I suggested on CASSANDRA-5762 that this *might* help but 
truth is, the vague intuition I had in mind back then doesn't work at all. I 
personally don't see how per-row TTL solves this problem concretely.

Back on the proposition of this ticket, I continue to think that it's a very 
simple solution, and while I can understand not being "thrilled" on a personal 
level, I have a hard time imagining this confusing people immensely, so I don't 
a big concrete downside to it (and for what is worth, at least one user on 
CASSANDRA-6586, someone that does care about the problem we're trying to solve 
here, judged this solution as "promising", which at least suggest that it might 
not be so bad).

> Add a 'NO EMPTY RESULTS' filter to SELECT
> -----------------------------------------
>
>                 Key: CASSANDRA-6588
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6588
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 2.1
>
>
> It is the semantic of CQL that a (CQL) row exists as long as it has one 
> non-null column (including the PK columns, which, given that no PK columns 
> can be null, means that it's enough to have the PK set for a row to exist). 
> This does means that the result to
> {noformat}
> CREATE TABLE test (k int PRIMARY KEY, v1 int, v2 int);
> INSERT INTO test(k, v1) VALUES (0, 4);
> SELECT v2 FROM test;
> {noformat}
> must be (and is)
> {noformat}
>  v2
> ------
>  null
> {noformat}
> That fact does mean however that when we only select a few columns of a row, 
> we still need to find out rows that exist but have no values for the selected 
> columns. Long story short, given how the storage engine works, this means we 
> need to query full (CQL) rows even when only some of the columns are selected 
> because that's the only way to distinguish between "the row exists but have 
> no value for the selected columns" and "the row doesn't exist". I'll note in 
> particular that, due to CASSANDRA-5762, we can't unfortunately rely on the 
> row marker to optimize that out.
> Now, when you selects only a subsets of the columns of a row, there is many 
> cases where you don't care about rows that exists but have no value for the 
> columns you requested and are happy to filter those out. So, for those cases, 
> we could provided a new SELECT filter. Outside the potential convenience (not 
> having to filter empty results client side), one interesting part is that 
> when this filter is provided, we could optimize a bit by only querying the 
> columns selected, since we wouldn't need to return rows that exists but have 
> no values for the selected columns.
> For the exact syntax, there is probably a bunch of options. For instance:
> * {{SELECT NON EMPTY(v2, v3) FROM test}}: the vague rational for putting it 
> in the SELECT part is that such filter is kind of in the spirit to DISTINCT.  
> Possibly a bit ugly outside of that.
> * {{SELECT v2, v3 FROM test NO EMPTY RESULTS}} or {{SELECT v2, v3 FROM test 
> NO EMPTY ROWS}} or {{SELECT v2, v3 FROM test NO EMPTY}}: the last one is 
> shorter but maybe a bit less explicit. As for {{RESULTS}} versus {{ROWS}}, 
> the only small object to {{NO EMPTY ROWS}} could be that it might suggest it 
> is filtering non existing rows (I mean, the fact we never ever return non 
> existing rows should hint that it's not what it does but well...) while we're 
> just filtering empty "resultSet rows".
> Of course, if there is a pre-existing SQL syntax for that, it's even better, 
> though a very quick search didn't turn anything. Other suggestions welcome 
> too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6588) Add a 'NO EMPTY RESULTS' filter to SELECT

Reply via email to