[ 
https://issues.apache.org/jira/browse/CASSANDRA-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728376#comment-14728376
 ] 

Tyler Hobbs commented on CASSANDRA-9664:
----------------------------------------

I have a [branch|https://github.com/thobbs/cassandra/tree/CASSANDRA-9664] with 
an implementation that's ready for review.

As noted in my previous comment, this only allows restricting columns that are 
in the base table's primary key.  When handling individual mutations this 
filtering can be performed prior to the read-before-write, so write-path work 
for unselected rows is minimal.  When performing the initial build of the MV, 
we don't yet take full advantage of the select statement's restrictions.  I 
would like to improve single-partition builds and builds that can be assisted 
by secondary indexes, but if it's okay with the reviewer, that feels best left 
to another ticket.

One of the trickiest parts of this ticket was representing the WHERE clause 
restrictions in the MV's schema.  This needs to support multi-column relations, 
single-column relations, and any operator (including IN, which expects multiple 
values).  The schema I settled on was this:

{noformat}
where_clause frozen<list<tuple<list<text>, int, list<text>>>>
{noformat}

Roughly speaking, this is a list of <id, operator, value> tuples, but with 
lists for ids and values to support multi-column relations.  I know the nesting 
is a little crazy there, but that allows us to represent everything that we 
need.  I also considered storing a single string of the WHERE clause, but this 
presents difficulties when loading the MV from the schema.  In particular, we 
don't have a good way to use the parser only for the {{whereClause}} rule.  If 
somebody has a better idea, I'm open to suggestions.

Last, this implementation is nearly restricted to what normal SELECT statements 
allow.  In some cases those restrictions don't make much sense for MVs, where 
we don't need to execute an efficient query.  For the most part I haven't 
changed anything here.  The one modification I did make is to allow filtering 
on clustering columns when the SELECT is being built for use by an MV.  As an 
example, if the base primary key is (a, b, c), the MV can do "WHERE c = 0" 
without restricting column b.  Normally this is only allowed if column c is 
indexed, but for MV purposes, this is efficient to filter.

Pending CI tests:
* [3.0 
testall|http://cassci.datastax.com/view/Dev/view/thobbs/job/thobbs-CASSANDRA-9664-testall/]
* [3.0 
dtest|http://cassci.datastax.com/view/Dev/view/thobbs/job/thobbs-CASSANDRA-9664-dtest/]

After the first round of review, I'll also run CI tests on trunk.

> Allow MV's select statements to be more complex
> -----------------------------------------------
>
>                 Key: CASSANDRA-9664
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9664
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Carl Yeksigian
>             Fix For: 3.x
>
>
> [Materialized Views|https://issues.apache.org/jira/browse/CASSANDRA-6477] add 
> support for a syntax which includes a {{SELECT}} statement, but only allows 
> selection of direct columns, and does not allow any filtering to take place.
> We should add support to the MV {{SELECT}} statement to bring better parity 
> with the normal CQL {{SELECT}} statement, specifically simple functions in 
> the selected columns, as well as specifying a {{WHERE}} clause.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to