[jira] [Commented] (CASSANDRA-6588) Add a 'NO EMPTY RESULTS' filter to SELECT

Sylvain Lebresne (JIRA) Fri, 17 Jan 2014 14:11:47 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13875310#comment-13875310
 ]


Sylvain Lebresne commented on CASSANDRA-6588:
---------------------------------------------

bq. But this was in the distant non-CQL past

I do understand the CQL layout and I did meant per-cql-row, *not* per-partition 
in my comment above.

bq. I think everyone agrees that per-partition is too coarse, but per-cql-row 
really feels like the right level of granularity to me. 

I acknowledge that it feels the right level of granularity for you, but I 
respectfully disagree and have seen/used concrete example contradicting your 
feeling. 

bq. for rare cases where you have data on the same primary key that should 
expire at different times, you still have the same make-two-tables workaround

My point is, the make-two-table is a workaround. By definition, it works around 
a limitation of C*. I don't think it's a good thing in that ticket use case and 
that's the reason for having this ticket, and I don't think making that 
necessary for TTL when it's not today would be progress. I do think having to 
work around such relatively simple things is unnecessary and is participating 
in making it harder to work with C*. And this even if that only affect a small 
amount of use cases. As such, I do think we should favor limiting the need for 
such work around, not adding new case were you need them. *And*, in the case of 
TTL, the "make-two-tables" workaround would be extremely painful for user that 
do have today data on the same primary key that should expire at different 
times because it will require painful data migration.

And again, I'm not really sure why we're having this debate in the first place 
because I personally have no evidence at all that removing per-column TTL would 
help solving in any way the problem this ticket aims to solve.

> Add a 'NO EMPTY RESULTS' filter to SELECT
> -----------------------------------------
>
>                 Key: CASSANDRA-6588
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6588
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 2.1
>
>
> It is the semantic of CQL that a (CQL) row exists as long as it has one 
> non-null column (including the PK columns, which, given that no PK columns 
> can be null, means that it's enough to have the PK set for a row to exist). 
> This does means that the result to
> {noformat}
> CREATE TABLE test (k int PRIMARY KEY, v1 int, v2 int);
> INSERT INTO test(k, v1) VALUES (0, 4);
> SELECT v2 FROM test;
> {noformat}
> must be (and is)
> {noformat}
>  v2
> ------
>  null
> {noformat}
> That fact does mean however that when we only select a few columns of a row, 
> we still need to find out rows that exist but have no values for the selected 
> columns. Long story short, given how the storage engine works, this means we 
> need to query full (CQL) rows even when only some of the columns are selected 
> because that's the only way to distinguish between "the row exists but have 
> no value for the selected columns" and "the row doesn't exist". I'll note in 
> particular that, due to CASSANDRA-5762, we can't unfortunately rely on the 
> row marker to optimize that out.
> Now, when you selects only a subsets of the columns of a row, there is many 
> cases where you don't care about rows that exists but have no value for the 
> columns you requested and are happy to filter those out. So, for those cases, 
> we could provided a new SELECT filter. Outside the potential convenience (not 
> having to filter empty results client side), one interesting part is that 
> when this filter is provided, we could optimize a bit by only querying the 
> columns selected, since we wouldn't need to return rows that exists but have 
> no values for the selected columns.
> For the exact syntax, there is probably a bunch of options. For instance:
> * {{SELECT NON EMPTY(v2, v3) FROM test}}: the vague rational for putting it 
> in the SELECT part is that such filter is kind of in the spirit to DISTINCT.  
> Possibly a bit ugly outside of that.
> * {{SELECT v2, v3 FROM test NO EMPTY RESULTS}} or {{SELECT v2, v3 FROM test 
> NO EMPTY ROWS}} or {{SELECT v2, v3 FROM test NO EMPTY}}: the last one is 
> shorter but maybe a bit less explicit. As for {{RESULTS}} versus {{ROWS}}, 
> the only small object to {{NO EMPTY ROWS}} could be that it might suggest it 
> is filtering non existing rows (I mean, the fact we never ever return non 
> existing rows should hint that it's not what it does but well...) while we're 
> just filtering empty "resultSet rows".
> Of course, if there is a pre-existing SQL syntax for that, it's even better, 
> though a very quick search didn't turn anything. Other suggestions welcome 
> too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (CASSANDRA-6588) Add a 'NO EMPTY RESULTS' filter to SELECT

Reply via email to