[
https://issues.apache.org/jira/browse/CASSANDRA-6588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13919009#comment-13919009
]
Mikhail Stepura commented on CASSANDRA-6588:
--------------------------------------------
I've implemented {{NO EMPTY ROWS}} by pruning the {{ResultSet}} in
{{SelectStatement.process}}. Not sure if it's the best way. Thoughts?
https://github.com/Mishail/cassandra/compare/apache:cassandra-2.1...CASSANDRA-6588-no-empty
> Add a 'NO EMPTY RESULTS' filter to SELECT
> -----------------------------------------
>
> Key: CASSANDRA-6588
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6588
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Sylvain Lebresne
> Priority: Minor
> Fix For: 2.1 beta2
>
>
> It is the semantic of CQL that a (CQL) row exists as long as it has one
> non-null column (including the PK columns, which, given that no PK columns
> can be null, means that it's enough to have the PK set for a row to exist).
> This does means that the result to
> {noformat}
> CREATE TABLE test (k int PRIMARY KEY, v1 int, v2 int);
> INSERT INTO test(k, v1) VALUES (0, 4);
> SELECT v2 FROM test;
> {noformat}
> must be (and is)
> {noformat}
> v2
> ------
> null
> {noformat}
> That fact does mean however that when we only select a few columns of a row,
> we still need to find out rows that exist but have no values for the selected
> columns. Long story short, given how the storage engine works, this means we
> need to query full (CQL) rows even when only some of the columns are selected
> because that's the only way to distinguish between "the row exists but have
> no value for the selected columns" and "the row doesn't exist". I'll note in
> particular that, due to CASSANDRA-5762, we can't unfortunately rely on the
> row marker to optimize that out.
> Now, when you selects only a subsets of the columns of a row, there is many
> cases where you don't care about rows that exists but have no value for the
> columns you requested and are happy to filter those out. So, for those cases,
> we could provided a new SELECT filter. Outside the potential convenience (not
> having to filter empty results client side), one interesting part is that
> when this filter is provided, we could optimize a bit by only querying the
> columns selected, since we wouldn't need to return rows that exists but have
> no values for the selected columns.
> For the exact syntax, there is probably a bunch of options. For instance:
> * {{SELECT NON EMPTY(v2, v3) FROM test}}: the vague rational for putting it
> in the SELECT part is that such filter is kind of in the spirit to DISTINCT.
> Possibly a bit ugly outside of that.
> * {{SELECT v2, v3 FROM test NO EMPTY RESULTS}} or {{SELECT v2, v3 FROM test
> NO EMPTY ROWS}} or {{SELECT v2, v3 FROM test NO EMPTY}}: the last one is
> shorter but maybe a bit less explicit. As for {{RESULTS}} versus {{ROWS}},
> the only small object to {{NO EMPTY ROWS}} could be that it might suggest it
> is filtering non existing rows (I mean, the fact we never ever return non
> existing rows should hint that it's not what it does but well...) while we're
> just filtering empty "resultSet rows".
> Of course, if there is a pre-existing SQL syntax for that, it's even better,
> though a very quick search didn't turn anything. Other suggestions welcome
> too.
--
This message was sent by Atlassian JIRA
(v6.2#6252)