Sylvain Lebresne created CASSANDRA-6588:
-------------------------------------------

             Summary: Add a 'NO EMPTY RESULTS' filter to SELECT
                 Key: CASSANDRA-6588
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6588
             Project: Cassandra
          Issue Type: Improvement
            Reporter: Sylvain Lebresne
            Priority: Minor
             Fix For: 2.1


It is the semantic of CQL that a (CQL) row exists as long as it has one 
non-null column (including the PK columns, which, given that no PK columns can 
be null, means that it's enough to have the PK set for a row to exist). This 
does means that the result to
{noformat}
CREATE TABLE test (k int PRIMARY KEY, v1 int, v2 int);
INSERT INTO test(k, v1) VALUES (0, 4);
SELECT v2 FROM test;
{noformat}
must be (and is)
{noformat}
 v2
------
 null
{noformat}

That fact does mean however that when we only select a few columns of a row, we 
still need to find out rows that exist but have no values for the selected 
columns. Long story short, given how the storage engine works, this means we 
need to query full (CQL) rows even when only some of the columns are selected 
because that's the only way to distinguish between "the row exists but have no 
value for the selected columns" and "the row doesn't exist". I'll note in 
particular that, due to CASSANDRA-5762, we can't unfortunately rely on the row 
marker to optimize that out.

Now, when you selects only a subsets of the columns of a row, there is many 
cases where you don't care about rows that exists but have no value for the 
columns you requested and are happy to filter those out. So, for those cases, 
we could provided a new SELECT filter. Outside the potential convenience (not 
having to filter empty results client side), one interesting part is that when 
this filter is provided, we could optimize a bit by only querying the columns 
selected, since we wouldn't need to return rows that exists but have no values 
for the selected columns.

For the exact syntax, there is probably a bunch of options. For instance:
* {{SELECT NON EMPTY(v2, v3) FROM test}}: the vague rational for putting it in 
the SELECT part is that such filter is kind of in the spirit to DISTINCT.  
Possibly a bit ugly outside of that.
* {{SELECT v2, v3 FROM test NO EMPTY RESULTS}} or {{SELECT v2, v3 FROM test NO 
EMPTY ROWS}} or {{SELECT v2, v3 FROM test NO EMPTY}}: the last one is shorter 
but maybe a bit less explicit. As for {{RESULTS}} versus {{ROWS}}, the only 
small object to {{NO EMPTY ROWS}} could be that it might suggest it is 
filtering non existing rows (I mean, the fact we never ever return non existing 
rows should hint that it's not what it does but well...) while we're just 
filtering empty "resultSet rows".

Of course, if there is a pre-existing SQL syntax for that, it's even better, 
though a very quick search didn't turn anything. Other suggestions welcome too.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to