[
https://issues.apache.org/jira/browse/CASSANDRA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548435#comment-14548435
]
Ryan Svihla commented on CASSANDRA-9415:
----------------------------------------
This would be a big win for a lot of analytics tools and would bring us ever
closer to RDBMS for ease of use. I can see this greatly smoothing the learning
curve for new users as well.
> Implicit use of Materialized Views on SELECT
> --------------------------------------------
>
> Key: CASSANDRA-9415
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9415
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Brian Hess
>
> CASSANDRA-6477 introduces Materialized Views. This greatly simplifies the
> write path for the best-practice of "query tables". But it does not simplify
> the read path as much as our users want/need.
> We suggest to folks to create multiple copies of their base table optimized
> for certain queries - hence "query table". For example, we may have a USER
> table with two type of queries: lookup by userid and lookup by email address.
> We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL. Both
> would have the exact same schema, with the same PRIMARY KEY columns, but
> different PARTITION KEY - the first would be USERID and the second would be
> EMAIL.
> One complicating thing with this approach is that the application now needs
> to know that when it INSERT/UPDATE/DELETEs from the base table it needs to
> INSERT/UPDATE/DELETE from all of the query tables as well. CASSANDRA-6477
> covers this nicely.
> However, the other side of the coin is that the application needs to know
> which query table to leverage based on the selection criteria. Using the
> example above, if the query has a predicate such as "WHERE userid = 'bhess'",
> then USERS_BY_USERID is the better table to use. Similarly, when the
> predicate is "WHERE email = '[email protected]'", USERS_BY_EMAIL is
> appropriate.
> On INSERT/UPDATE/DELETE, Materialized Views essentially give a single "name"
> to the collection of tables. You do operations just on the base table. It
> is very attractive for the SELECT side as well. It would be very good to
> allow an application to simply do "SELECT * FROM users WHERE userid =
> 'bhess'" and have that query implicitly leverage the USERS_BY_USERID
> materialized view.
> For additional use cases, especially analytics use cases like in Spark, this
> allows the Spark code to simply push down the query without having to know
> about all of the MVs that have been set up. The system will route the query
> appropriately. And if additional MVs are necessary to make a query run
> better/faster, then those MVs can be set up and Spark will implicitly
> leverage them.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)