Indhumathi27 commented on a change in pull request #3720: [CARBONDATA-3775] 
Update materialized view document
URL: https://github.com/apache/carbondata/pull/3720#discussion_r410675875
 
 

 ##########
 File path: docs/mv-guide.md
 ##########
 @@ -0,0 +1,342 @@
+<!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements.  See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to you under the Apache License, Version 2.0
+    (the "License"); you may not use this file except in compliance with
+    the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+    Unless required by applicable law or agreed to in writing, software
+    distributed under the License is distributed on an "AS IS" BASIS,
+    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+    See the License for the specific language governing permissions and
+    limitations under the License.
+-->
+
+# CarbonData Materialized View
+
+* [Quick Example](#quick-example)
+* [Introduction](#introduction)
+* [Loading Data](#loading-data)
+* [Querying Data](#querying-data)
+* [Compaction](#compacting)
+* [Data Management](#data-management)
+* [Time Series Support](#time-series-support)
+* [Time Series RollUp Support](#time-series-rollup-support)
+
+## Quick example
+
+ Start spark-sql in terminal and run the following queries,
+
+   ```
+     CREATE TABLE maintable(a int, b string, c int) stored as carbondata;
+     INSERT INTO maintable SELECT 1, 'ab', 2;
+     CREATE MATERIALIZED VIEW view1 AS SELECT a, sum(b) FROM maintable GROUP 
BY a;
+     SELECT a, sum(b) FROM maintable GROUP BY a;
+     // NOTE: run explain query and check if query hits the Index table from 
the plan
+     EXPLAIN SELECT a, sum(b) FROM maintable GROUP BY a;
+   ```
+
+## Introduction
+
+ Materialized views are created as sub-queries. User can create limitless 
materialized view to 
+ improve query performance provided the storage requirements and loading time 
is acceptable.
+ 
+ Materialized view can refreshed on commit or on manual. Once materialized 
views are created, 
+ CarbonData's CarbonAnalyzer helps to select the most efficient materialized 
view based on 
+ the user query and rewrite the SQL to select the data from materialized view 
instead of 
+ related tables. Since the data size of materialized view is smaller and data 
is pre-processed, 
+ user queries are much faster.
+ 
+ For instance, related table called **sales** which is defined as.
+ 
+   ```
+     CREATE TABLE sales (
+       order_time timestamp,
+       user_id string,
+       sex string,
+       country string,
+       quantity int,
+       price bigint)
+     STORED AS carbondata
+   ```
+
+ User can create materialized view using the CREATE MATERIALIZED VIEW 
statement.
+ 
+   ```
+     CREATE MATERIALIZED VIEW agg_sales
+     PROPERTIES('TABLE_BLOCKSIZE'='256 MB','LOCAL_DICTIONARY_ENABLE'='false')
+     AS
+       SELECT country, sex, sum(quantity), avg(price)
+       FROM sales
+       GROUP BY country, sex
+   ```
+
+ **NOTE**:
+   * Group by and Order by columns has to be provided in projection list while 
creating materialized view.
+   * If only single related table is involved in materialized view creation, 
then TableProperties of 
+     related table (if not present in a aggregate function like sum(col)) 
listed below will be 
+     inherited to materialized view.
+       1. SORT_COLUMNS
+       2. SORT_SCOPE
+       3. TABLE_BLOCKSIZE
+       4. FLAT_FOLDER
+       5. LONG_STRING_COLUMNS
+       6. LOCAL_DICTIONARY_ENABLE
+       7. LOCAL_DICTIONARY_THRESHOLD
+       8. LOCAL_DICTIONARY_EXCLUDE
+       9. INVERTED_INDEX
+       10. NO_INVERTED_INDEX
+       11. COLUMN_COMPRESSOR
+   * Creating materialized view with select query containing only project of 
all columns of related 
+     table is unsupported.
+     **Example:**
+       If table 'x' contains columns 'a,b,c', then creating MV Index with 
below queries is not supported.
+         1. ```SELECT a,b,c FROM x```
+         2. ```SELECT * FROM x```
+   * TableProperties can be provided in Properties excluding 
LOCAL_DICTIONARY_INCLUDE,
+     LOCAL_DICTIONARY_EXCLUDE, INVERTED_INDEX, NO_INVERTED_INDEX, 
SORT_COLUMNS, LONG_STRING_COLUMNS, 
+     RANGE_COLUMN & COLUMN_META_CACHE.
+   * TableProperty given in Properties will be considered for materialized 
view creation, even though 
+     if same property is inherited from related table, which allows user to 
provide different table 
+     properties for materialized view.
+   * Materialized view creation with limit or union all CTAS queries is 
unsupported.
+   * Materialized view does not support streaming.
+
+#### How materialized views are selected
+
+ When a user query is submitted, during query planning phase, CarbonData will 
collect modular plan
+ candidates and process the the ModularPlan based on registered summary data 
sets. Then,
+ materialized view for this query will be selected among the candidates.
+
+ For the related table **sales** and materialized view **agg_sales** created 
above, following queries
+   ```
+     SELECT country, sex, sum(quantity), avg(price) FROM sales GROUP BY 
country, sex
+     SELECT sex, sum(quantity) FROM sales GROUP BY sex
+     SELECT avg(price), country FROM sales GROUP BY country
+   ```
+
+ will be transformed by CarbonData's query planner to query against 
materialized view**agg_sales** 
 
 Review comment:
   ```suggestion
    will be transformed by CarbonData's query planner to query against 
materialized view **agg_sales** 
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to