[ 
https://issues.apache.org/jira/browse/CASSANDRA-9415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14548608#comment-14548608
 ] 

 Brian Hess commented on CASSANDRA-9415:
----------------------------------------

[~jbellis] Oracle, DB2, and SQL Server do this (at least - maybe others).

In Oracle it is Materialized Views (see 
http://docs.oracle.com/cd/B28359_01/server.111/b28313/basicmv.htm):
bq. The end user queries the tables and views at the detail data level. The 
query rewrite mechanism in the Oracle server automatically rewrites the SQL 
query to use the summary tables. This mechanism reduces response time for 
returning results from the query. Materialized views within the data warehouse 
are transparent to the end user or to the database application."

In DB2 it is Materialized Query Tables (see 
http://www.ibm.com/developerworks/data/library/techarticle/dm-0509melnyk/)
bq. Materialized query tables can significantly improve the performance of 
queries, especially complex queries. If the optimizer determines that a query 
or part of a query could be resolved using an MQT, the query might be rewritten 
to take advantage of the MQT.

In SQL Server it is Indexed Views (see 
https://msdn.microsoft.com/en-us/library/dd171921(SQL.100).aspx):
bq. The indexed view can be used in a query execution in two ways. The query 
can reference the indexed view directly, or, more importantly, the query 
optimizer can select the view if it determines that the view can be substituted 
for some or all of the query in the lowest-cost query plan. In the second case, 
the indexed view is used instead of the underlying tables and their ordinary 
indexes. The view does not need to be referenced in the query for the query 
optimizer to use it during query execution. This allows existing applications 
to benefit from the newly created indexed views without changing those 
applications.


> Implicit use of Materialized Views on SELECT
> --------------------------------------------
>
>                 Key: CASSANDRA-9415
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9415
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter:  Brian Hess
>              Labels: ponies
>
> CASSANDRA-6477 introduces Materialized Views.  This greatly simplifies the 
> write path for the best-practice of "query tables".  But it does not simplify 
> the read path as much as our users want/need.
> We suggest to folks to create multiple copies of their base table optimized 
> for certain queries - hence "query table".  For example, we may have a USER 
> table with two type of queries: lookup by userid and lookup by email address. 
>  We would recommend creating 2 tables USER_BY_USERID and USER_BY_EMAIL.  Both 
> would have the exact same schema, with the same PRIMARY KEY columns, but 
> different PARTITION KEY - the first would be USERID and the second would be 
> EMAIL.
> One complicating thing with this approach is that the application now needs 
> to know that when it INSERT/UPDATE/DELETEs from the base table it needs to 
> INSERT/UPDATE/DELETE from all of the query tables as well.  CASSANDRA-6477 
> covers this nicely.
> However, the other side of the coin is that the application needs to know 
> which query table to leverage based on the selection criteria.  Using the 
> example above, if the query has a predicate such as "WHERE userid = 'bhess'", 
> then USERS_BY_USERID is the better table to use.  Similarly, when the 
> predicate is "WHERE email = '[email protected]'", USERS_BY_EMAIL is 
> appropriate.
> On INSERT/UPDATE/DELETE, Materialized Views essentially give a single "name" 
> to the collection of tables.  You do operations just on the base table.  It 
> is very attractive for the SELECT side as well.  It would be very good to 
> allow an application to simply do "SELECT * FROM users WHERE userid = 
> 'bhess'" and have that query implicitly leverage the USERS_BY_USERID 
> materialized view.
> For additional use cases, especially analytics use cases like in Spark, this 
> allows the Spark code to simply push down the query without having to know 
> about all of the MVs that have been set up.  The system will route the query 
> appropriately.  And if additional MVs are necessary to make a query run 
> better/faster, then those MVs can be set up and Spark will implicitly 
> leverage them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to