[jira] [Commented] (CASSANDRA-9415) Implicit use of Materialized Views on SELECT

Ryan Svihla (JIRA) Mon, 18 May 2015 11:11:24 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548435#comment-14548435
 ]


Ryan Svihla commented on CASSANDRA-9415:
----------------------------------------

This would be a big win for a lot of analytics tools and would bring us ever 
closer to RDBMS for ease of use. I can see this greatly smoothing the learning 
curve for new users as well.

> Implicit use of Materialized Views on SELECT
> --------------------------------------------
>
>                 Key: CASSANDRA-9415
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9415
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter:  Brian Hess
>
> CASSANDRA-6477 introduces Materialized Views.  This greatly simplifies the 
> write path for the best-practice of "query tables".  But it does not simplify 
> the read path as much as our users want/need.
> We suggest to folks to create multiple copies of their base table optimized 
> for certain queries - hence "query table".  For example, we may have a USER 
> table with two type of queries: lookup by userid and lookup by email address. 
>  We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL.  Both 
> would have the exact same schema, with the same PRIMARY KEY columns, but 
> different PARTITION KEY - the first would be USERID and the second would be 
> EMAIL.
> One complicating thing with this approach is that the application now needs 
> to know that when it INSERT/UPDATE/DELETEs from the base table it needs to 
> INSERT/UPDATE/DELETE from all of the query tables as well.  CASSANDRA-6477 
> covers this nicely.
> However, the other side of the coin is that the application needs to know 
> which query table to leverage based on the selection criteria.  Using the 
> example above, if the query has a predicate such as "WHERE userid = 'bhess'", 
> then USERS_BY_USERID is the better table to use.  Similarly, when the 
> predicate is "WHERE email = '[email protected]'", USERS_BY_EMAIL is 
> appropriate.
> On INSERT/UPDATE/DELETE, Materialized Views essentially give a single "name" 
> to the collection of tables.  You do operations just on the base table.  It 
> is very attractive for the SELECT side as well.  It would be very good to 
> allow an application to simply do "SELECT * FROM users WHERE userid = 
> 'bhess'" and have that query implicitly leverage the USERS_BY_USERID 
> materialized view.
> For additional use cases, especially analytics use cases like in Spark, this 
> allows the Spark code to simply push down the query without having to know 
> about all of the MVs that have been set up.  The system will route the query 
> appropriately.  And if additional MVs are necessary to make a query run 
> better/faster, then those MVs can be set up and Spark will implicitly 
> leverage them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9415) Implicit use of Materialized Views on SELECT

Reply via email to