[
https://issues.apache.org/jira/browse/CASSANDRA-9664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14728376#comment-14728376
]
Tyler Hobbs commented on CASSANDRA-9664:
----------------------------------------
I have a [branch|https://github.com/thobbs/cassandra/tree/CASSANDRA-9664] with
an implementation that's ready for review.
As noted in my previous comment, this only allows restricting columns that are
in the base table's primary key. When handling individual mutations this
filtering can be performed prior to the read-before-write, so write-path work
for unselected rows is minimal. When performing the initial build of the MV,
we don't yet take full advantage of the select statement's restrictions. I
would like to improve single-partition builds and builds that can be assisted
by secondary indexes, but if it's okay with the reviewer, that feels best left
to another ticket.
One of the trickiest parts of this ticket was representing the WHERE clause
restrictions in the MV's schema. This needs to support multi-column relations,
single-column relations, and any operator (including IN, which expects multiple
values). The schema I settled on was this:
{noformat}
where_clause frozen<list<tuple<list<text>, int, list<text>>>>
{noformat}
Roughly speaking, this is a list of <id, operator, value> tuples, but with
lists for ids and values to support multi-column relations. I know the nesting
is a little crazy there, but that allows us to represent everything that we
need. I also considered storing a single string of the WHERE clause, but this
presents difficulties when loading the MV from the schema. In particular, we
don't have a good way to use the parser only for the {{whereClause}} rule. If
somebody has a better idea, I'm open to suggestions.
Last, this implementation is nearly restricted to what normal SELECT statements
allow. In some cases those restrictions don't make much sense for MVs, where
we don't need to execute an efficient query. For the most part I haven't
changed anything here. The one modification I did make is to allow filtering
on clustering columns when the SELECT is being built for use by an MV. As an
example, if the base primary key is (a, b, c), the MV can do "WHERE c = 0"
without restricting column b. Normally this is only allowed if column c is
indexed, but for MV purposes, this is efficient to filter.
Pending CI tests:
* [3.0
testall|http://cassci.datastax.com/view/Dev/view/thobbs/job/thobbs-CASSANDRA-9664-testall/]
* [3.0
dtest|http://cassci.datastax.com/view/Dev/view/thobbs/job/thobbs-CASSANDRA-9664-dtest/]
After the first round of review, I'll also run CI tests on trunk.
> Allow MV's select statements to be more complex
> -----------------------------------------------
>
> Key: CASSANDRA-9664
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9664
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Carl Yeksigian
> Fix For: 3.x
>
>
> [Materialized Views|https://issues.apache.org/jira/browse/CASSANDRA-6477] add
> support for a syntax which includes a {{SELECT}} statement, but only allows
> selection of direct columns, and does not allow any filtering to take place.
> We should add support to the MV {{SELECT}} statement to bring better parity
> with the normal CQL {{SELECT}} statement, specifically simple functions in
> the selected columns, as well as specifying a {{WHERE}} clause.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)