[carbondata] branch master updated: [CARBONDATA-3791] Correct spelling, link and ddl in SI and MV Documentation

akashrn5 Wed, 06 May 2020 03:30:03 -0700

This is an automated email from the ASF dual-hosted git repository.

akashrn5 pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git



The following commit(s) were added to refs/heads/master by this push:
     new 3ea6b18  [CARBONDATA-3791] Correct spelling, link and ddl in SI and MV 
Documentation
3ea6b18 is described below

commit 3ea6b181b41b0f9a6de348574d166df8ff7019f6
Author: Indhumathi27 <[email protected]>
AuthorDate: Sun May 3 17:17:06 2020 +0530

    [CARBONDATA-3791] Correct spelling, link and ddl in SI and MV Documentation
    
    Why is this PR needed?
    Correct spelling, link and ddl in SI and MV Documentation
    
    What changes were proposed in this PR?
    Fixed spelling, link and ddl in SI and MV Documentation
    
    This closes #3735
---
 docs/configuration-parameters.md                   |  2 +-
 docs/index/bloomfilter-index-guide.md              | 15 +++--
 docs/index/index-management.md                     | 37 +++++------
 docs/index/lucene-index-guide.md                   | 30 ++++-----
 docs/index/secondary-index-guide.md                | 76 +++++++++++-----------
 docs/mv-guide.md                                   | 58 ++++++++---------
 .../CarbonDataFileMergeTestCaseOnSI.scala          |  2 +-
 7 files changed, 109 insertions(+), 111 deletions(-)

diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md
index 486b133..4627cac 100644
--- a/docs/configuration-parameters.md
+++ b/docs/configuration-parameters.md
@@ -116,7 +116,7 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.compaction.prefetch.enable | false | Compaction operation is similar 
to Query + data load where in data from qualifying segments are queried and 
data loading performed to generate a new single segment. This configuration 
determines whether to query ahead data from segments and feed it for data 
loading. **NOTE: **This configuration is disabled by default as it needs extra 
resources for querying extra data. Based on the memory availability on the 
cluster, user can enable it to imp [...]
 | carbon.merge.index.in.segment | true | Each CarbonData file has a companion 
CarbonIndex file which maintains the metadata about the data. These CarbonIndex 
files are read and loaded into driver and is used subsequently for pruning of 
data during queries. These CarbonIndex files are very small in size(few KB) and 
are many. Reading many small files from HDFS is not efficient and leads to slow 
IO performance. Hence these CarbonIndex files belonging to a segment can be 
combined into  a sin [...]
 | carbon.enable.range.compaction | true | To configure Ranges-based Compaction 
to be used or not for RANGE_COLUMN. If true after compaction also the data 
would be present in ranges. |
-| carbon.si.segment.merge | false | Making this true degrade the LOAD 
performance. When the number of small files increase for SI segments(it can 
happen as number of columns will be less and we store position id and reference 
columns), user an either set to true which will merge the data files for 
upcoming loads or run SI rebuild command which does this job for all segments. 
(REBUILD INDEX <index_table>) |
+| carbon.si.segment.merge | false | Making this true degrade the LOAD 
performance. When the number of small files increase for SI segments(it can 
happen as number of columns will be less and we store position id and reference 
columns), user an either set to true which will merge the data files for 
upcoming loads or run SI refresh command which does this job for all segments. 
(REFRESH INDEX <index_table>) |
 
 ## Query Configuration
 
diff --git a/docs/index/bloomfilter-index-guide.md 
b/docs/index/bloomfilter-index-guide.md
index 85f284a..03085f1 100644
--- a/docs/index/bloomfilter-index-guide.md
+++ b/docs/index/bloomfilter-index-guide.md
@@ -36,14 +36,15 @@ Creating BloomFilter Index
 Dropping Specified Index
   ```
   DROP INDEX [IF EXISTS] index_name
-  ON TABLE main_table
+  ON [TABLE] main_table
   ```
 
 Showing all Indexes on this table
   ```
   SHOW INDEXES
-  ON TABLE main_table
+  ON [TABLE] main_table
   ```
+> NOTE: Keywords given inside `[]` is optional.
 
 Disable Index
 > The index by default is enabled. To support tuning on query, we can disable 
 > a specific index during query to observe whether we can gain performance 
 > enhancement from it. This is effective only for current session.
@@ -59,7 +60,7 @@ Disable Index
 ## BloomFilter Index Introduction
 A Bloom filter is a space-efficient probabilistic data structure that is used 
to test whether an element is a member of a set.
 Carbondata introduced BloomFilter as an index to enhance the performance of 
querying with precise value.
-It is well suitable for queries that do precise match on high cardinality 
columns(such as Name/ID).
+It is well suitable for queries that do precise matching on high cardinality 
columns(such as Name/ID).
 Internally, CarbonData maintains a BloomFilter per blocklet for each index 
column to indicate that whether a value of the column is in this blocklet.
 Just like the other indexes, BloomFilter index is managed along with main 
tables by CarbonData.
 User can create BloomFilter index on specified columns with specified 
BloomFilter configurations such as size and probability.
@@ -79,7 +80,7 @@ For instance, main table called **index_test** which is 
defined as:
 
 In the above example, `id` and `name` are high cardinality columns
 and we always query on `id` and `name` with precise value.
-since `id` is in the sort_columns and it is orderd,
+since `id` is in the sort_columns and it is ordered,
 query on it will be fast because CarbonData can skip all the irrelative 
blocklets.
 But queries on `name` may be bad since the blocklet minmax may not help,
 because in each blocklet the range of the value of `name` may be the same -- 
all from A* to z*.
@@ -96,7 +97,7 @@ User can create BloomFilter Index using the Create Index DDL:
   PROPERTIES ('BLOOM_SIZE'='640000', 'BLOOM_FPP'='0.00001', 
'BLOOM_COMPRESS'='true')
   ```
 
-Here, (name,id) are INDEX_COLUMNS. Carbondata will generate BloomFilter index 
on these columns. Queries on these columns are usually like 'COL = VAL'.
+Here, (name,id) are INDEX_COLUMNS. Carbondata will generate BloomFilter index 
on these columns. Queries on these columns are usually like `'COL = VAL'`.
 
 **Properties for BloomFilter Index**
 
@@ -131,7 +132,7 @@ You can refer to the corresponding section in [CarbonData 
Lucene Index](https://
 + We can create multiple BloomFilter Indexes on one table,
  but we do recommend you to create one BloomFilter Index that contains 
multiple index columns,
  because the data loading and query performance will be better.
-+ `BLOOM_FPP` is only the expected number from user, the actually FPP may be 
worse.
++ `BLOOM_FPP` is only the expected number from user, the actual FPP may be 
worse.
  If the BloomFilter Index does not work well,
  you can try to increase `BLOOM_SIZE` and decrease `BLOOM_FPP` at the same 
time.
  Notice that bigger `BLOOM_SIZE` will increase the size of index file
@@ -145,5 +146,5 @@ You can refer to the corresponding section in [CarbonData 
Lucene Index](https://
 + In some scenarios, the BloomFilter Index may not enhance the query 
performance significantly
  but if it can reduce the number of spark task,
  there is still a chance that BloomFilter Index can enhance the performance 
for concurrent query.
-+ Note that BloomFilter Index will decrease the data loading performance and 
may cause slightly storage expansion (for index file).
++ Note that BloomFilter Index will decrease the data loading performance and 
may cause slight storage expansion (for index file).
 
diff --git a/docs/index/index-management.md b/docs/index/index-management.md
index 6b4b6ec..7bd9c75 100644
--- a/docs/index/index-management.md
+++ b/docs/index/index-management.md
@@ -51,54 +51,51 @@ Currently, there are 3 Index implementations in CarbonData.
 
 There are two kinds of management semantic for Index.
 
-1. Automatic Refresh: Create index without `WITH DEFERRED REBUILD` in the 
statement, which is by default.
-2. Manual Refresh: Create index with `WITH DEFERRED REBUILD` in the statement
+1. Automatic Refresh
+2. Manual Refresh
 
 ### Automatic Refresh
 
-When user creates a index on the main table without using `WITH DEFERRED 
REFRESH` syntax, the index will be managed by system automatically.
-For every data load to the main table, system will immediately trigger a load 
to the index automatically. These two data loading (to main table and index) is 
executed in a transactional manner, meaning that it will be either both success 
or neither success. 
+When a user creates an index on the main table without using `WITH DEFERRED 
REFRESH` syntax, the index will be managed by the system automatically.
+For every data load to the main table, the system will immediately trigger a 
load to the index automatically. These two data loading (to main table and 
index) is executed in a transactional manner, meaning that it will be either 
both success or neither success. 
 
-The data loading to index is incremental based on Segment concept, avoiding a 
expensive total rebuild.
+The data loading to index is incremental based on Segment concept, avoiding an 
expensive total refresh.
 
-If user perform following command on the main table, system will return 
failure. (reject the operation)
+If a user performs the following command on the main table, the system will 
return failure. (reject the operation)
 
 1. Data management command: `UPDATE/DELETE/DELETE SEGMENT`.
 2. Schema management command: `ALTER TABLE DROP COLUMN`, `ALTER TABLE CHANGE 
DATATYPE`,
    `ALTER TABLE RENAME`. Note that adding a new column is supported, and for 
dropping columns and
    change datatype command, CarbonData will check whether it will impact the 
index table, if
-    not, the operation is allowed, otherwise operation will be rejected by 
throwing exception.
+    not, the operation is allowed, otherwise operation will be rejected by 
throwing an exception.
 3. Partition management command: `ALTER TABLE ADD/DROP PARTITION`.
 
-If user do want to perform above operations on the main table, user can first 
drop the index, perform the operation, and re-create the index again.
+If a user does want to perform above operations on the main table, the user 
can first drop the index, perform the operation, and re-create the index again.
 
-If user drop the main table, the index will be dropped immediately too.
+If a user drops the main table, the index will be dropped immediately too.
 
-We do recommend you to use this management for index.
+We do recommend you to use this management for indexing.
 
 ### Manual Refresh
 
-When user creates a index specifying manual refresh semantic, the index is 
created with status *disabled* and query will NOT use this index until user can 
issue REFRESH INDEX command to build the index. For every REFRESH INDEX 
command, system will trigger a full rebuild of the index. After rebuild is 
done, system will change index status to *enabled*, so that it can be used in 
query rewrite.
+When a user creates an index on the main table using `WITH DEFERRED REFRESH` 
syntax, the index will be created with status *disabled* and query will NOT use 
this index until the user issues `REFRESH INDEX` command to build the index. 
For every `REFRESH INDEX` command, the system will trigger a full refresh of 
the index. Once the refresh operation is finished, system will change index 
status to *enabled*, so that it can be used in query rewrite.
 
 For every new data loading, data update, delete, the related index will be 
made *disabled*,
 which means that the following queries will not benefit from the index before 
it becomes *enabled* again.
 
-If the main table is dropped by user, the related index will be dropped 
immediately.
+If the main table is dropped by the user, the related index will be dropped 
immediately.
 
 **Note**:
-+ If you are creating a index on external table, you need to do manual 
management of the index.
-+ For index such as BloomFilter index, there is no need to do manual refresh.
- By default it is automatic refresh,
- which means its data will get refreshed immediately after the index is 
created or the main table is loaded.
- Manual refresh on this index will has no impact.
++ If you are creating an index on an external table, you need to do manual 
management of the index.
++ Currently, all types of indexes supported by carbon will be automatically 
refreshed by default, which means its data will get refreshed immediately after 
the index is created or the main table is loaded. Manual refresh on these 
indexes is not supported.
 
 ## Index Related Commands
 
 ### Explain
 
-How can user know whether index is used in the query?
+How can users know whether an index is used in the query?
 
-User can set enable.query.statistics = true and use EXPLAIN command to know, 
it will print out something like
+User can set `enable.query.statistics = true` and use `EXPLAIN` command to 
know, it will print out something like
 
 ```text
 == CarbonData Profiler ==
@@ -113,7 +110,7 @@ Table Scan on default.main
 
 ### Show Index
 
-There is a SHOW INDEXES command, when this is issued, system will read all 
index from the carbon table and print all information on screen. The current 
information includes:
+There is a SHOW INDEXES command, when this is issued, the system will read all 
indexes from the carbon table and print all information on screen. The current 
information includes:
 
 - Name
 - Provider like lucene
diff --git a/docs/index/lucene-index-guide.md b/docs/index/lucene-index-guide.md
index c811ec3..87f840a 100644
--- a/docs/index/lucene-index-guide.md
+++ b/docs/index/lucene-index-guide.md
@@ -36,14 +36,15 @@ index_columns is the list of string columns on which lucene 
creates indexes.
 Index can be dropped using following DDL:
   ```
   DROP INDEX [IF EXISTS] index_name
-  ON TABLE main_table
+  ON [TABLE] main_table
   ```
 To show all Indexes created, use:
   ```
   SHOW INDEXES
-  ON TABLE main_table
+  ON [TABLE] main_table
   ```
-It will show all Indexes created on main table.
+It will show all Indexes created on the main table.
+> NOTE: Keywords given inside `[]` is optional.
 
 
 ## Lucene Index Introduction
@@ -83,28 +84,28 @@ It will show all Indexes created on main table.
 When loading data to main table, lucene index files will be generated for all 
the
 index_columns(String Columns) given in CREATE statement which contains 
information about the data
 location of index_columns. These index files will be written inside a folder 
named with index name
-inside each segment folders.
+inside each segment folder.
 
-A system level configuration carbon.lucene.compression.mode can be added for 
best compression of
+A system level configuration `carbon.lucene.compression.mode` can be added for 
best compression of
 lucene index files. The default value is speed, where the index writing speed 
will be more. If the
 value is compression, the index file size will be compressed.
 
 ## Querying data
 As a technique for query acceleration, Lucene indexes cannot be queried 
directly.
-Queries are to be made on main table. when a query with TEXT_MATCH('name:c10') 
or 
+Queries are to be made on the main table. When a query with 
TEXT_MATCH('name:c10') or 
 TEXT_MATCH_WITH_LIMIT('name:n10',10)[the second parameter represents the 
number of result to be 
 returned, if user does not specify this value, all results will be returned 
without any limit] is 
-fired, two jobs are fired. The first job writes the temporary files in folder 
created at table level 
-which contains lucene's seach results and these files will be read in second 
job to give faster 
+fired, two jobs will be launched. The first job writes the temporary files in 
folder created at table level 
+which contains lucene's search results and these files will be read in second 
job to give faster 
 results. These temporary files will be cleared once the query finishes.
 
-User can verify whether a query can leverage Lucene index or not by executing 
`EXPLAIN`
+User can verify whether a query can leverage Lucene index or not by executing 
the `EXPLAIN`
 command, which will show the transformed logical plan, and thus user can check 
whether TEXT_MATCH()
 filter is applied on query or not.
 
 **Note:**
- 1. The filter columns in TEXT_MATCH or TEXT_MATCH_WITH_LIMIT must be always 
in lower case and 
-filter condition like 'AND','OR' must be in upper case.
+ 1. The filter columns in TEXT_MATCH or TEXT_MATCH_WITH_LIMIT must be always 
in lowercase and 
+filter conditions like 'AND','OR' must be in upper case.
 
       Ex: 
       ```
@@ -124,7 +125,7 @@ filter condition like 'AND','OR' must be in upper case.
    ```
        
           
-Below like queries can be converted to text_match queries as following:
+Below `like` queries can be converted to text_match queries as following:
 ```
 select * from index_test where name='n10'
 
@@ -151,9 +152,8 @@ select * from index_test where TEXT_MATCH('name:*10 
-name:*n*')
 **Note:** For lucene queries and syntax, refer to 
[lucene-syntax](http://www.lucenetutorial.com/lucene-query-syntax.html)
 
 ## Data Management with lucene index
-Once there is lucene index is created on the main table, following command on 
the main
-table
-is not supported:
+Once there is a lucene index created on the main table, following command on 
the main
+table is not supported:
 1. Data management command: `UPDATE/DELETE`.
 2. Schema management command: `ALTER TABLE DROP COLUMN`, `ALTER TABLE CHANGE 
DATATYPE`, 
 `ALTER TABLE RENAME`.
diff --git a/docs/index/secondary-index-guide.md 
b/docs/index/secondary-index-guide.md
index e588ed9..1d86b82 100644
--- a/docs/index/secondary-index-guide.md
+++ b/docs/index/secondary-index-guide.md
@@ -30,34 +30,36 @@ Start spark-sql in terminal and run the following queries,
 ```
 CREATE TABLE maintable(a int, b string, c string) stored as carbondata;
 insert into maintable select 1, 'ab', 'cd';
-CREATE index inex1 on table maintable(c) AS 'carbondata';
+CREATE index index1 on table maintable(c) AS 'carbondata';
 SELECT a from maintable where c = 'cd';
 // NOTE: run explain query and check if query hits the SI table from the plan
 EXPLAIN SELECT a from maintable where c = 'cd';
 ```
 
 ## Secondary Index Introduction
-  Sencondary index tables are created as a indexes and managed as child tables 
internally by
-  Carbondata. Users can create secondary index based on the column position in 
main table(Recommended
+  Secondary index tables are created as indexes and managed as child tables 
internally by
+  Carbondata. Users can create a secondary index based on the column position 
in the main table(Recommended
   for right columns) and the queries should have filter on that column to 
improve the filter query
   performance.
   
-  SI tables will always be loaded non-lazy way. Once SI table is created, 
Carbondata's 
+  Data refresh to the secondary index is always automatic. Once SI table is 
created, Carbondata's 
   CarbonOptimizer with the help of `CarbonSITransformationRule`, transforms 
the query plan to hit the
   SI table based on the filter condition or set of filter conditions present 
in the query.
-  So first level of pruning will be done on SI table as it stores blocklets 
and main table/parent
+  So the first level of pruning will be done on the SI table as it stores 
blocklets and main table/parent
   table pruning will be based on the SI output, which helps in giving the 
faster query results with
   better pruning.
 
-  Secondary Index table can be create with below syntax
+  Secondary Index table can be created with the below syntax
 
    ```
    CREATE INDEX [IF NOT EXISTS] index_name
    ON TABLE maintable(index_column)
    AS
    'carbondata'
-   [TBLPROPERTIES('table_blocksize'='1')]
+   [PROPERTIES('table_blocksize'='1')]
    ```
+> NOTE: Keywords given inside `[]` is optional.
+
   For instance, main table called **sales** which is defined as
 
   ```
@@ -78,16 +80,16 @@ EXPLAIN SELECT a from maintable where c = 'cd';
   ON TABLE sales(user_id)
   AS
   'carbondata'
-  TBLPROPERTIES('table_blocksize'='1')
+  PROPERTIES('table_blocksize'='1')
   ```
  
  
 #### How SI tables are selected
 
-When a user executes a filter query, during query planning phase, CarbonData 
with help of
+When a user executes a filter query, during the query planning phase, 
CarbonData with the help of
 `CarbonSITransformationRule`, checks if there are any index tables present on 
the filter column of
-query. If there are any, then filter query plan will be transformed such a way 
that, execution will
-first hit the corresponding SI table and give input to main table for further 
pruning.
+query. If there are any, then the filter query plan will be transformed in 
such a way that execution will
+first hit the corresponding SI table and give input to the main table for 
further pruning.
 
 
 For the main table **sales** and SI table  **index_sales** created above, 
following queries
@@ -105,27 +107,27 @@ will be transformed by CarbonData's 
`CarbonSITransformationRule` to query agains
 
 ### Loading data to Secondary Index table(s).
 
-*case1:* When SI table is created and the main table does not have any data. 
In this case every
-consecutive load will load to SI table once main table data load is finished.
+*case1:* When the SI table is created and the main table does not have any 
data. In this case every
+consecutive load to the main table, will load data to the SI table once the 
main table data load is finished.
 
-*case2:* When SI table is created and main table already contains some data, 
then SI creation will
-also load to SI table with same number of segments as main table. There after, 
consecutive load to
-main table will load to SI table also.
+*case2:* When the SI table is created and the main table already contains some 
data, then SI creation will
+also load data to the SI table with the same number of segments as the main 
table. Thereafter, consecutive load to
+the main table will also load data to the SI table.
 
  **NOTE**:
- * In case of data load failure to SI table, then we make the SI table disable 
by setting a hive serde
+ * In case of data load failure to the SI table, then we make the SI table 
disable by setting a hive serde
  property. The subsequent main table load will load the old failed loads along 
with current load and
  makes the SI table enable and available for query.
 
 ## Querying data
-Direct query can be made on SI tables to see the data present in position 
reference columns.
-When a filter query is fired, if the filter column is a secondary index 
column, then plan is
-transformed accordingly to hit SI table first to make better pruning with main 
table and in turn
+Direct query can be made on SI tables to check the data present in position 
reference columns.
+When a filter query is fired, and if the filter column is a secondary index 
column, then plan is
+transformed accordingly to hit the SI table first to make better pruning with 
the main table and in turn
 helps for faster query results.
 
-User can verify whether a query can leverage SI table or not by executing 
`EXPLAIN`
-command, which will show the transformed logical plan, and thus user can check 
whether SI table
-table is selected.
+Users can verify whether a query can leverage the SI table or not by executing 
the `EXPLAIN`
+command, which will show the transformed logical plan, and thus users can 
check whether the SI table
+is selected.
 
 
 ## Compacting SI table
@@ -133,33 +135,33 @@ table is selected.
 ### Compacting SI table table through Main Table compaction
 Running Compaction command (`ALTER TABLE COMPACT`)[COMPACTION TYPE-> 
MINOR/MAJOR] on main table will
 automatically delete all the old segments of SI and creates a new segment with 
same name as main
-table compacted segmet and loads data to it.
+table compacted segment and loads data to it.
 
-### Compacting SI table's individual segment(s) through REBUILD command
-Where there are so many small files present in the SI table, then we can use 
REBUILD command to
+### Compacting SI table's individual segment(s) through REFRESH INDEX command
+Where there are so many small files present in the SI table, then we can use 
the REFRESH INDEX command to
 compact the files within an SI segment to avoid many small files.
 
   ```
-  REBUILD INDEX sales_index
+  REFRESH INDEX sales_index
   ```
-This command merges data files in  each segment of SI table.
+This command merges data files in each segment of the SI table.
 
   ```
-  REBUILD INDEX sales_index WHERE SEGMENT.ID IN(1)
+  REFRESH INDEX sales_index WHERE SEGMENT.ID IN(1)
   ```
-This command merges data files within specified segment of SI table.
+This command merges data files within a specified segment of the SI table.
 
 ## How to skip Secondary Index?
-When Secondary indexes are created on a table(s), always data fetching happens 
from secondary
+When Secondary indexes are created on a table(s), data fetching happens from 
secondary
 indexes created on the main tables for better performance. But sometimes, data 
fetching from the
-secondary index might degrade query performance in case where the data is 
sparse and most of the
+secondary index might degrade query performance in cases where the data is 
sparse and most of the
 blocklets need to be scanned. So to avoid such secondary indexes, we use NI as 
a function on filters
-with in WHERE clause.
+within WHERE clause.
 
   ```
   SELECT country, sex from sales where NI(user_id = 'xxx')
   ```
-The above query ignores column user_id from secondary index and fetch data 
from main table.
+The above query ignores column `user_id` from the secondary index and fetches 
data from the main table.
 
 ## DDLs on Secondary Index
 
@@ -168,7 +170,7 @@ This command is used to get information about all the 
secondary indexes on a tab
 
 Syntax
   ```
-  SHOW INDEXES  on [db_name.]table_name
+  SHOW INDEXES ON [TABLE] [db_name.]table_name
   ```
 
 ### Drop index Command
@@ -176,7 +178,7 @@ This command is used to drop an existing secondary index on 
a table
 
 Syntax
   ```
-  DROP INDEX [IF EXISTS] index_name on [db_name.]table_name
+  DROP INDEX [IF EXISTS] index_name ON [TABLE] [db_name.]table_name
   ```
 
 ### Register index Command
@@ -185,5 +187,5 @@ where we have old stores.
 
 Syntax
   ```
-  REGISTER INDEX TABLE index_name ON [db_name.]table_name
+  REGISTER INDEX TABLE index_name ON [TABLE] [db_name.]table_name
   ```
\ No newline at end of file
diff --git a/docs/mv-guide.md b/docs/mv-guide.md
index 9902e1c..24e38b1 100644
--- a/docs/mv-guide.md
+++ b/docs/mv-guide.md
@@ -35,17 +35,17 @@
      INSERT INTO maintable SELECT 1, 'ab', 2;
      CREATE MATERIALIZED VIEW view1 AS SELECT a, sum(b) FROM maintable GROUP 
BY a;
      SELECT a, sum(b) FROM maintable GROUP BY a;
-     // NOTE: run explain query and check if query hits the Index table from 
the plan
+     // NOTE: run explain query and check if query hits the mv table from the 
plan
      EXPLAIN SELECT a, sum(b) FROM maintable GROUP BY a;
    ```
 
-## Introductions
+## Introduction
 
- Materialized views are created as tables from queries. User can create 
limitless materialized view 
+ Materialized views are created as tables from queries. Users can create 
limitless materialized views 
  to improve query performance provided the storage requirements and loading 
time is acceptable.
  
  Materialized view can be refreshed on commit or on manual. Once materialized 
views are created, 
- CarbonData's MVRewriteRule helps to select the most efficient materialized 
view based on 
+ CarbonData's `MVRewriteRule` helps to select the most efficient materialized 
view based on 
  the user query and rewrite the SQL to select the data from materialized view 
instead of 
  fact tables. Since the data size of materialized view is smaller and data is 
pre-processed, 
  user queries are much faster.
@@ -63,7 +63,7 @@
      STORED AS carbondata
    ```
 
- User can create materialized view using the CREATE MATERIALIZED VIEW 
statement.
+ Users can create a materialized view using the CREATE MATERIALIZED VIEW 
statement.
  
    ```
      CREATE MATERIALIZED VIEW agg_sales
@@ -75,7 +75,7 @@
    ```
 
  **NOTE**:
-   * Group by and Order by columns has to be provided in projection list while 
creating materialized view.
+   * Group by and Order by columns has to be provided in the projection list 
while creating a materialized view.
    * If only single fact table is involved in materialized view creation, then 
TableProperties of 
      fact table (if not present in a aggregate function like sum(col)) listed 
below will be 
      inherited to materialized view.
@@ -93,7 +93,7 @@
    * Creating materialized view with select query containing only project of 
all columns of fact 
      table is unsupported.
      **Example:**
-       If table 'x' contains columns 'a,b,c', then creating MV Index with 
below queries is not supported.
+       If table 'x' contains columns 'a,b,c', then creating MV with below 
queries is not supported.
          1. ```SELECT a,b,c FROM x```
          2. ```SELECT * FROM x```
    * TableProperties can be provided in Properties excluding 
LOCAL_DICTIONARY_INCLUDE,
@@ -107,9 +107,9 @@
 
 #### How materialized views are selected
 
- When a user query is submitted, during query planning phase, CarbonData will 
collect modular plan
- candidates and process the the ModularPlan based on registered summary data 
sets. Then,
- materialized view for this query will be selected among the candidates.
+ When a user query is submitted, during the query planning phase, CarbonData 
will collect modular plan
+ candidates and process the ModularPlan based on registered summary data sets. 
Then,
+ a materialized view for this query will be selected among the candidates.
 
  For the fact table **sales** and materialized view **agg_sales** created 
above, following queries
    ```
@@ -140,7 +140,7 @@
  view will be triggered by the CREATE MATERIALIZED VIEW statement when user 
creates the materialized 
  view.
 
- For incremental loads to fact table, data to materialized view will be loaded 
once the 
+ For incremental loads to the fact table, data to materialized view will be 
loaded once the 
  corresponding fact table load is completed.
 
 ### Loading data on manual
@@ -148,7 +148,7 @@
  In case of WITH DEFERRED REFRESH, data load to materialized view will be 
triggered by the refresh 
  command. Materialized view will be in DISABLED state in below scenarios.
 
-   * when materialized view is created.
+   * when a materialized view is created.
    * when data of fact table and materialized view are not in sync.
   
  User should fire REFRESH MATERIALIZED VIEW command to sync all segments of 
fact table with 
@@ -163,27 +163,27 @@
 
  During load to fact table, if anyone of the load to materialized view fails, 
then that 
  corresponding materialized view will be DISABLED and load to other 
materialized views mapped 
- to fact table will continue. 
+ to the fact table will continue. 
 
  User can fire REFRESH MATERIALIZED VIEW command to sync or else the 
subsequent table load 
  will load the old failed loads along with current load and enable the 
disabled materialized view.
 
  **NOTE**:
    * In case of InsertOverwrite/Update operation on fact table, all segments 
of materialized view 
-     will be MARKED_FOR_DELETE and reload to Index table will happen by 
REFRESH MATERIALIZED VIEW, 
+     will be MARKED_FOR_DELETE and reload to mv table will happen by REFRESH 
MATERIALIZED VIEW, 
      in case of materialized view which refresh on manual and once the 
InsertOverwrite/Update 
      operation on fact table is finished, in case of materialized view which 
refresh on commit.
    * In case of full scan query, Data Size and Index Size of fact table and 
materialized view 
-     will not the same, as fact table and materialized view has different 
column names.
+     will not be the same, as fact table and materialized view have different 
column names.
 
 ## Querying data
 
- Queries are to be made on fact table. While doing query planning, internally 
CarbonData will check
+ Queries are to be made on the fact table. While doing query planning, 
internally CarbonData will check
  for the materialized views which are associated with the fact table, and do 
query plan 
  transformation accordingly.
  
- User can verify whether a query can leverage materialized view or not by 
executing `EXPLAIN` command, 
- which will show the transformed logical plan, and thus user can check whether 
materialized view 
+ Users can verify whether a query can leverage materialized view or not by 
executing the `EXPLAIN` command, 
+ which will show the transformed logical plan, and thus the user can check 
whether a materialized view 
  is selected.
 
 ## Compacting
@@ -207,7 +207,7 @@
       materialized view, if not, the operation is allowed, otherwise operation 
will be rejected by
       throwing exception.
    3. Partition management command: `ALTER TABLE ADD/DROP PARTITION`. Note 
that dropping a partition
-      will be allowed only if partition is participating in all indexes 
associated with fact table.
+      will be allowed only if the partition column of fact table is 
participating in all of the table's materialized views.
       Drop Partition is not allowed, if any materialized view is associated 
with more than one 
       fact table. Drop Partition directly on materialized view is not allowed.
    4. Complex Datatype's for materialized view is not supported.
@@ -215,7 +215,7 @@
  However, there is still way to support these operations on fact table, in 
current CarbonData
  release, user can do as following:
  
-   1. Remove the materialized by `DROP MATERIALIZED VIEW` command.
+   1. Remove the materialized view by `DROP MATERIALIZED VIEW` command.
    2. Carry out the data management operation on fact table.
    3. Create the materialized view again by `CREATE MATERIALIZED VIEW` command.
    
@@ -273,14 +273,14 @@
      GROUP BY timeseries(order_time, 'minute')
    ```
  And execute the below query to check time series data. In this example, a 
materialized view of 
- aggregated table on price column will be created, which will be aggregated on 
every one minute.
+ the aggregated table on the price column will be created, which will be 
aggregated every one minute.
   
    ```
      SELECT timeseries(order_time,'minute'), avg(price)
      FROM sales
      GROUP BY timeseries(order_time,'minute')
    ```
- Find below the result of above query aggregated over minute.
+ Find below the result of the above query aggregated over a minute.
  
    ```
      +---------------------------------------+----------------+
@@ -300,19 +300,17 @@
  granularity provided during creation and stored on each segment.
  
  **NOTE**:
-   1. Single select statement cannot contain time series udf(s) neither with 
different granularity
-      nor with different timestamp/date columns.
-   2. Retention policies for time series is not supported yet.
+   1. Retention policies for time series is not supported yet.
  
 ## Time Series RollUp Support
 
- Time series queries can be rolled up from existing materialized view.
+ Time series queries can be rolled up from an existing materialized view.
  
 ### Query RollUp
 
  Consider an example where the query is on hour level granularity, but the 
materialized view
  with hour level granularity is not present but materialized view with minute 
level granularity is 
- present, then we can get the data from minute level and the aggregate the 
hour level data and 
+ present, then we can get the data from minute level and aggregate the hour 
level data and 
  give output. This is called query rollup.
  
  Consider if user create's below time series materialized view,
@@ -334,10 +332,10 @@
    ```
 
  Then, the above query can be rolled up from materialized view 'agg_sales', by 
adding hour
- level time series aggregation on minute level aggregation. Users can fire 
explain command
- to check if query is rolled up from existing materialized view.
+ level time series aggregation on minute level aggregation. Users can fire the 
`EXPLAIN` command
+ to check if a query is rolled up from an existing materialized view.
  
   **NOTE**:
-    1. Queries cannot be rolled up, if filter contains time series function.
+    1. Queries cannot be rolled up, if the filter contains a time series 
function.
     2. Roll up is not yet supported for queries having join clause or order by 
functions.
   
\ No newline at end of file
diff --git 
a/index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/mergedata/CarbonDataFileMergeTestCaseOnSI.scala
 
b/index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/mergedata/CarbonDataFileMergeTestCaseOnSI.scala
index 9eced78..00c7d4a 100644
--- 
a/index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/mergedata/CarbonDataFileMergeTestCaseOnSI.scala
+++ 
b/index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/mergedata/CarbonDataFileMergeTestCaseOnSI.scala
@@ -142,7 +142,7 @@ class CarbonDataFileMergeTestCaseOnSI
     checkAnswer(sql("""Select count(*) from nonindexmerge where 
name='n164419'"""), rows)
   }
 
-  test("Verify command of REBUILD INDEX command with invalid segments") {
+  test("Verify command of REFRESH INDEX command with invalid segments") {
     CarbonProperties.getInstance()
       .addProperty(CarbonCommonConstants.CARBON_SI_SEGMENT_MERGE, "false")
     sql("DROP TABLE IF EXISTS nonindexmerge")

[carbondata] branch master updated: [CARBONDATA-3791] Correct spelling, link and ddl in SI and MV Documentation

Reply via email to