[carbondata] branch master updated: [CARBONDATA-3772] Update index documents

kunalkapoor Tue, 21 Apr 2020 23:15:26 -0700

This is an automated email from the ASF dual-hosted git repository.

kunalkapoor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git



The following commit(s) were added to refs/heads/master by this push:
     new 483e7da  [CARBONDATA-3772] Update index documents
483e7da is described below

commit 483e7da5c5394251c9a04cefe9924393af72f39f
Author: Gampa Shreelekhya <[email protected]>
AuthorDate: Tue Apr 14 18:40:03 2020 +0530

    [CARBONDATA-3772] Update index documents
    
    Why is this PR needed?
    update index documentation to comply with recent changes
    
    What changes were proposed in this PR?
    Does this PR introduce any user interface change?
    No
    Yes. (please explain the change and update document)
    Is any new testcase added?
    No
    Yes
    
    This closes #3708
---
 README.md                             |  10 +--
 docs/faq.md                           |  37 -----------
 docs/index-developer-guide.md         |  17 ++---
 docs/index/bloomfilter-index-guide.md | 107 +++++++++++++++---------------
 docs/index/index-management.md        | 119 +++++++++++++++-------------------
 docs/index/lucene-index-guide.md      |  91 +++++++++++++-------------
 docs/language-manual.md               |   3 +-
 7 files changed, 166 insertions(+), 218 deletions(-)

diff --git a/README.md b/README.md
index c7f935d..b1a712c 100644
--- a/README.md
+++ b/README.md
@@ -53,11 +53,11 @@ CarbonData is built using Apache Maven, to [build 
CarbonData](https://github.com
  * [CarbonData Data Manipulation 
Language](https://github.com/apache/carbondata/blob/master/docs/dml-of-carbondata.md)
 
  * [CarbonData Streaming 
Ingestion](https://github.com/apache/carbondata/blob/master/docs/streaming-guide.md)
 
  * [Configuring 
CarbonData](https://github.com/apache/carbondata/blob/master/docs/configuration-parameters.md)
 
- * [DataMap Developer 
Guide](https://github.com/apache/carbondata/blob/master/docs/datamap-developer-guide.md)
 
+ * [Index Developer 
Guide](https://github.com/apache/carbondata/blob/master/docs/index-developer-guide.md)
 
  * [Data 
Types](https://github.com/apache/carbondata/blob/master/docs/supported-data-types-in-carbondata.md)
 
-* [CarbonData DataMap 
Management](https://github.com/apache/carbondata/blob/master/docs/datamap/datamap-management.md)
 
- * [CarbonData BloomFilter 
DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/bloomfilter-datamap-guide.md)
 
- * [CarbonData Lucene 
DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/lucene-datamap-guide.md)
 
+* [CarbonData Index 
Management](https://github.com/apache/carbondata/blob/master/docs/index/index-management.md)
 
+ * [CarbonData BloomFilter 
Index](https://github.com/apache/carbondata/blob/master/docs/index/bloomfilter-index-guide.md)
 
+ * [CarbonData Lucene 
Index](https://github.com/apache/carbondata/blob/master/docs/index/lucene-index-guide.md)
 
  * [CarbonData MV 
DataMap](https://github.com/apache/carbondata/blob/master/docs/datamap/mv-datamap-guide.md)
 * [Carbondata Secondary 
Index](https://github.com/apache/carbondata/blob/master/docs/index/secondary-index-guide.md)
 * [SDK 
Guide](https://github.com/apache/carbondata/blob/master/docs/sdk-guide.md) 
@@ -70,7 +70,7 @@ CarbonData is built using Apache Maven, to [build 
CarbonData](https://github.com
 
 ##  Integration
 * [Hive](https://github.com/apache/carbondata/blob/master/docs/hive-guide.md)
-* 
[Presto](https://github.com/apache/carbondata/blob/master/docs/presto-guide.md)
+* 
[Presto](https://github.com/apache/carbondata/blob/master/docs/prestodb-guide.md)
 * 
[Alluxio](https://github.com/apache/carbondata/blob/master/docs/alluxio-guide.md)
 
 ## Other Technical Material
diff --git a/docs/faq.md b/docs/faq.md
index 45607f4..f3f0a6d 100644
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -25,7 +25,6 @@
 * [What is Carbon Lock Type?](#what-is-carbon-lock-type)
 * [How to resolve Abstract Method 
Error?](#how-to-resolve-abstract-method-error)
 * [How Carbon will behave when execute insert operation in abnormal 
scenarios?](#how-carbon-will-behave-when-execute-insert-operation-in-abnormal-scenarios)
-* [Why aggregate query is not fetching data from aggregate 
table?](#why-aggregate-query-is-not-fetching-data-from-aggregate-table)
 * [Why all executors are showing success in Spark UI even after Dataload 
command failed at Driver 
side?](#why-all-executors-are-showing-success-in-spark-ui-even-after-dataload-command-failed-at-driver-side)
 * [Why different time zone result for select query output when query SDK 
writer 
output?](#why-different-time-zone-result-for-select-query-output-when-query-sdk-writer-output)
 * [How to check LRU cache memory 
footprint?](#how-to-check-lru-cache-memory-footprint)
@@ -162,42 +161,6 @@ INSERT INTO TABLE carbon_table SELECT id, city FROM 
source_table;
 
 When the column type in carbon table is different from the column specified in 
select statement. The insert operation will still success, but you may get NULL 
in result, because NULL will be substitute value when conversion type failed.
 
-## Why aggregate query is not fetching data from aggregate table?
-Following are the aggregate queries that won't fetch data from aggregate table:
-
-- **Scenario 1** :
-When SubQuery predicate is present in the query.
-
-Example:
-
-```
-create table gdp21(cntry smallint, gdp double, y_year date) stored as 
carbondata;
-create datamap ag1 on table gdp21 using 'preaggregate' as select cntry, 
sum(gdp) from gdp21 group by cntry;
-select ctry from pop1 where ctry in (select cntry from gdp21 group by cntry);
-```
-
-- **Scenario 2** : 
-When aggregate function along with 'in' filter.
-
-Example:
-
-```
-create table gdp21(cntry smallint, gdp double, y_year date) stored as 
carbondata;
-create datamap ag1 on table gdp21 using 'preaggregate' as select cntry, 
sum(gdp) from gdp21 group by cntry;
-select cntry, sum(gdp) from gdp21 where cntry in (select ctry from pop1) group 
by cntry;
-```
-
-- **Scenario 3** : 
-When aggregate function having 'join' with equal filter.
-
-Example:
-
-```
-create table gdp21(cntry smallint, gdp double, y_year date) stored as 
carbondata;
-create datamap ag1 on table gdp21 using 'preaggregate' as select cntry, 
sum(gdp) from gdp21 group by cntry;
-select cntry,sum(gdp) from gdp21,pop1 where cntry=ctry group by cntry;
-```
-
 ## Why all executors are showing success in Spark UI even after Dataload 
command failed at Driver side?
 Spark executor shows task as failed after the maximum number of retry 
attempts, but loading the data having bad records and BAD_RECORDS_ACTION 
(carbon.bad.records.action) is set as "FAIL" will attempt only once but will 
send the signal to driver as failed instead of throwing the exception to retry, 
as there is no point to retry if bad record found and BAD_RECORDS_ACTION is set 
to fail. Hence the Spark executor displays this one attempt as successful but 
the command has actually failed to [...]
 
diff --git a/docs/index-developer-guide.md b/docs/index-developer-guide.md
index 106bf39..198c3e9 100644
--- a/docs/index-developer-guide.md
+++ b/docs/index-developer-guide.md
@@ -18,17 +18,18 @@
 # Index Developer Guide
 
 ### Introduction
-DataMap is a data structure that can be used to accelerate certain query of 
the table. Different DataMap can be implemented by developers. 
-Currently, there are two types of DataMap supported:
-1. IndexDataMap: DataMap that leverages index to accelerate filter query. 
Lucene DataMap and BloomFiler DataMap belong to this type of DataMaps.
-2. MVDataMap: DataMap that leverages Materialized View to accelerate olap 
style query, like SPJG query (select, predicate, join, groupby). Preaggregate, 
timeseries and mv DataMap belong to this type of DataMaps.
+Index is a data structure that can be used to accelerate certain query of the 
table. Different Index can be implemented by developers. 
+Currently, Carbondata supports three types of Indexes:
+1. BloomFilter Index: A space-efficient probabilistic data structure that is 
used to test whether an element is a member of a set.
+2. Lucene Index: High performance, full-featured text search engine.
+3. Secondary Index: Sencondary index tables to hold blocklets are created as 
indexes and managed as child tables internally by Carbondata.
 
 ### Index Provider
-When user issues `CREATE INDEX index_name ON TABLE main AS 'provider'`, the 
corresponding DataMapProvider implementation will be created and initialized. 
+When user issues `CREATE INDEX index_name ON TABLE main AS 'provider'`, the 
corresponding IndexProvider implementation will be created and initialized. 
 Currently, the provider string can be:
-1. class name IndexDataMapFactory implementation: Developer can implement new 
type of IndexDataMap by extending IndexDataMapFactory
+1. class name IndexFactory implementation: Developer can implement new type of 
Index by extending IndexFactory
 
-When user issues `DROP INDEX index_name ON TABLE main`, the corresponding 
DataMapProvider interface will be called.
+When user issues `DROP INDEX index_name ON TABLE main`, the corresponding 
IndexFactory class will be called.
 
-Click for more details about [DataMap 
Management](./index/index-management.md#index-management) and supported 
[DSL](./index/index-management.md#overview).
+Click for more details about [Index 
Management](./index/index-management.md#index-management) and supported 
[DSL](./index/index-management.md#overview).
 
diff --git a/docs/index/bloomfilter-index-guide.md 
b/docs/index/bloomfilter-index-guide.md
index 264cf0b..85f284a 100644
--- a/docs/index/bloomfilter-index-guide.md
+++ b/docs/index/bloomfilter-index-guide.md
@@ -15,59 +15,59 @@
     limitations under the License.
 -->
 
-# CarbonData BloomFilter DataMap
+# CarbonData BloomFilter Index
 
-* [DataMap Management](#datamap-management)
-* [BloomFilter Datamap Introduction](#bloomfilter-datamap-introduction)
+* [Index Management](#index-management)
+* [BloomFilter Index Introduction](#bloomfilter-index-introduction)
 * [Loading Data](#loading-data)
 * [Querying Data](#querying-data)
-* [Data Management](#data-management-with-bloomfilter-datamap)
+* [Data Management](#data-management-with-bloomfilter-index)
 * [Useful Tips](#useful-tips)
 
-#### DataMap Management
-Creating BloomFilter DataMap
+#### Index Management
+Creating BloomFilter Index
   ```
-  CREATE DATAMAP [IF NOT EXISTS] datamap_name
-  ON TABLE main_table
-  USING 'bloomfilter'
-  DMPROPERTIES ('index_columns'='city, name', 'BLOOM_SIZE'='640000', 
'BLOOM_FPP'='0.00001')
+  CREATE INDEX [IF NOT EXISTS] index_name
+  ON TABLE main_table (city,name)
+  AS 'bloomfilter'
+  PROPERTIES ('BLOOM_SIZE'='640000', 'BLOOM_FPP'='0.00001')
   ```
 
-Dropping Specified DataMap
+Dropping Specified Index
   ```
-  DROP DATAMAP [IF EXISTS] datamap_name
+  DROP INDEX [IF EXISTS] index_name
   ON TABLE main_table
   ```
 
-Showing all DataMaps on this table
+Showing all Indexes on this table
   ```
-  SHOW DATAMAP
+  SHOW INDEXES
   ON TABLE main_table
   ```
 
-Disable DataMap
-> The datamap by default is enabled. To support tuning on query, we can 
disable a specific datamap during query to observe whether we can gain 
performance enhancement from it. This is effective only for current session.
+Disable Index
+> The index by default is enabled. To support tuning on query, we can disable 
a specific index during query to observe whether we can gain performance 
enhancement from it. This is effective only for current session.
 
   ```
   // disable the index
-  SET carbon.index.visible.dbName.tableName.dataMapName = false
+  SET carbon.index.visible.dbName.tableName.indexName = false
   // enable the index
-  SET carbon.index.visible.dbName.tableName.dataMapName = true
+  SET carbon.index.visible.dbName.tableName.indexName = true
   ```
 
 
-## BloomFilter DataMap Introduction
+## BloomFilter Index Introduction
 A Bloom filter is a space-efficient probabilistic data structure that is used 
to test whether an element is a member of a set.
-Carbondata introduced BloomFilter as an index datamap to enhance the 
performance of querying with precise value.
+Carbondata introduced BloomFilter as an index to enhance the performance of 
querying with precise value.
 It is well suitable for queries that do precise match on high cardinality 
columns(such as Name/ID).
 Internally, CarbonData maintains a BloomFilter per blocklet for each index 
column to indicate that whether a value of the column is in this blocklet.
-Just like the other datamaps, BloomFilter datamap is managed along with main 
tables by CarbonData.
-User can create BloomFilter datamap on specified columns with specified 
BloomFilter configurations such as size and probability.
+Just like the other indexes, BloomFilter index is managed along with main 
tables by CarbonData.
+User can create BloomFilter index on specified columns with specified 
BloomFilter configurations such as size and probability.
 
-For instance, main table called **datamap_test** which is defined as:
+For instance, main table called **index_test** which is defined as:
 
   ```
-  CREATE TABLE datamap_test (
+  CREATE TABLE index_test (
     id string,
     name string,
     age int,
@@ -83,24 +83,25 @@ since `id` is in the sort_columns and it is orderd,
 query on it will be fast because CarbonData can skip all the irrelative 
blocklets.
 But queries on `name` may be bad since the blocklet minmax may not help,
 because in each blocklet the range of the value of `name` may be the same -- 
all from A* to z*.
-In this case, user can create a BloomFilter DataMap on column `name`.
-Moreover, user can also create a BloomFilter DataMap on the sort_columns.
+In this case, user can create a BloomFilter Index on column `name`.
+Moreover, user can also create a BloomFilter Index on the sort_columns.
 This is useful if user has too many segments and the range of the value of 
sort_columns are almost the same.
 
-User can create BloomFilter DataMap using the Create DataMap DDL:
+User can create BloomFilter Index using the Create Index DDL:
 
   ```
-  CREATE DATAMAP dm
-  ON TABLE datamap_test
-  USING 'bloomfilter'
-  DMPROPERTIES ('INDEX_COLUMNS' = 'name,id', 'BLOOM_SIZE'='640000', 
'BLOOM_FPP'='0.00001', 'BLOOM_COMPRESS'='true')
+  CREATE INDEX dm
+  ON TABLE index_test (name,id)
+  AS 'bloomfilter'
+  PROPERTIES ('BLOOM_SIZE'='640000', 'BLOOM_FPP'='0.00001', 
'BLOOM_COMPRESS'='true')
   ```
 
-**Properties for BloomFilter DataMap**
+Here, (name,id) are INDEX_COLUMNS. Carbondata will generate BloomFilter index 
on these columns. Queries on these columns are usually like 'COL = VAL'.
+
+**Properties for BloomFilter Index**
 
 | Property | Is Required | Default Value | Description |
 |-------------|----------|--------|---------|
-| INDEX_COLUMNS | YES |  | Carbondata will generate BloomFilter index on these 
columns. Queries on these columns are usually like 'COL = VAL'. |
 | BLOOM_SIZE | NO | 640000 | This value is internally used by BloomFilter as 
the number of expected insertions, it will affect the size of BloomFilter 
index. Since each blocklet has a BloomFilter here, so the default value is the 
approximate distinct index values in a blocklet assuming that each blocklet 
contains 20 pages and each page contains 32000 records. The value should be an 
integer. |
 | BLOOM_FPP | NO | 0.00001 | This value is internally used by BloomFilter as 
the False-Positive Probability, it will affect the size of bloomfilter index as 
well as the number of hash functions for the BloomFilter. The value should be 
in the range (0, 1). In one test scenario, a 96GB TPCH customer table with 
bloom_size=320000 and bloom_fpp=0.00001 will result in 18 false positive 
samples. |
 | BLOOM_COMPRESS | NO | true | Whether to compress the BloomFilter index 
files. |
@@ -108,41 +109,41 @@ User can create BloomFilter DataMap using the Create 
DataMap DDL:
 
 ## Loading Data
 When loading data to main table, BloomFilter files will be generated for all 
the
-index_columns given in DMProperties which contains the blockletId and a 
BloomFilter for each index column.
-These index files will be written inside a folder named with DataMap name
+index_columns provided in the CREATE statement which contains the blockletId 
and a BloomFilter for each index column.
+These index files will be written inside a folder named with Index name
 inside each segment folders.
 
 
 ## Querying Data
 
-User can verify whether a query can leverage BloomFilter DataMap by executing 
`EXPLAIN` command,
-which will show the transformed logical plan, and thus user can check whether 
the BloomFilter DataMap can skip blocklets during the scan.
-If the DataMap does not prune blocklets well, you can try to increase the 
value of property `BLOOM_SIZE` and decrease the value of property `BLOOM_FPP`.
+User can verify whether a query can leverage BloomFilter Index by executing 
`EXPLAIN` command,
+which will show the transformed logical plan, and thus user can check whether 
the BloomFilter Index can skip blocklets during the scan.
+If the Index does not prune blocklets well, you can try to increase the value 
of property `BLOOM_SIZE` and decrease the value of property `BLOOM_FPP`.
 
-## Data Management With BloomFilter DataMap
-Data management with BloomFilter DataMap has no difference with that on Lucene 
DataMap.
-You can refer to the corresponding section in `CarbonData Lucene DataMap`.
+## Data Management With BloomFilter Index
+Data management with BloomFilter Index has no difference with that on Lucene 
Index.
+You can refer to the corresponding section in [CarbonData Lucene 
Index](https://github.com/apache/carbondata/blob/master/docs/index/lucene-index-guide.md)
 
 ## Useful Tips
-+ BloomFilter DataMap is suggested to be created on the high cardinality 
columns.
++ BloomFilter Index is suggested to be created on the high cardinality columns.
  Query conditions on these columns are always simple `equal` or `in`,
  such as 'col1=XX', 'col1 in (XX, YY)'.
-+ We can create multiple BloomFilter DataMaps on one table,
- but we do recommend you to create one BloomFilter DataMap that contains 
multiple index columns,
++ We can create multiple BloomFilter Indexes on one table,
+ but we do recommend you to create one BloomFilter Index that contains 
multiple index columns,
  because the data loading and query performance will be better.
 + `BLOOM_FPP` is only the expected number from user, the actually FPP may be 
worse.
- If the BloomFilter DataMap does not work well,
+ If the BloomFilter Index does not work well,
  you can try to increase `BLOOM_SIZE` and decrease `BLOOM_FPP` at the same 
time.
  Notice that bigger `BLOOM_SIZE` will increase the size of index file
  and smaller `BLOOM_FPP` will increase runtime calculation while performing 
query.
-+ '0' skipped blocklets of BloomFilter DataMap in explain output indicates that
- BloomFilter DataMap does not prune better than Main DataMap.
- (For example since the data is not ordered, a specific value may be contained 
in many blocklets. In this case, bloom may not work better than Main DataMap.)
++ '0' skipped blocklets of BloomFilter Index in explain output indicates that
+ BloomFilter Index does not prune better than Main Index.
+ (For example since the data is not ordered, a specific value may be contained 
in many blocklets. In this case, bloom may not work better than Main Index.)
  If this occurs very often, it means that current BloomFilter is useless. You 
can disable or drop it.
- Sometimes we cannot see any pruning result about BloomFilter DataMap in the 
explain output,
- this indicates that the previous DataMap has pruned all the blocklets and 
there is no need to continue pruning.
-+ In some scenarios, the BloomFilter DataMap may not enhance the query 
performance significantly
+ Sometimes we cannot see any pruning result about BloomFilter Index in the 
explain output,
+ this indicates that the previous Index has pruned all the blocklets and there 
is no need to continue pruning.
++ In some scenarios, the BloomFilter Index may not enhance the query 
performance significantly
  but if it can reduce the number of spark task,
- there is still a chance that BloomFilter DataMap can enhance the performance 
for concurrent query.
-+ Note that BloomFilter DataMap will decrease the data loading performance and 
may cause slightly storage expansion (for DataMap index file).
+ there is still a chance that BloomFilter Index can enhance the performance 
for concurrent query.
++ Note that BloomFilter Index will decrease the data loading performance and 
may cause slightly storage expansion (for index file).
 
diff --git a/docs/index/index-management.md b/docs/index/index-management.md
index 01f3604..6b4b6ec 100644
--- a/docs/index/index-management.md
+++ b/docs/index/index-management.md
@@ -18,124 +18,107 @@
 # CarbonData Index Management
 
 - [Overview](#overview)
-- [DataMap Management](#datamap-management)
+- [Index Management](#index-management)
 - [Automatic Refresh](#automatic-refresh)
 - [Manual Refresh](#manual-refresh)
-- [DataMap Catalog](#datamap-catalog)
-- [DataMap Related Commands](#datamap-related-commands)
+- [Index Related Commands](#index-related-commands)
   - [Explain](#explain)
-  - [Show DataMap](#show-datamap)
+  - [Show Index](#show-index)
 
 
 
 ## Overview
 
-DataMap can be created using following DDL
+Index can be created using following DDL
 
 ```
-CREATE DATAMAP [IF NOT EXISTS] datamap_name
-[ON TABLE main_table]
-USING "datamap_provider"
-[WITH DEFERRED REBUILD]
-DMPROPERTIES ('key'='value', ...)
-AS
-  SELECT statement
+CREATE INDEX [IF NOT EXISTS] index_name
+ON TABLE [db_name.]table_name (column_name, ...)
+AS carbondata/bloomfilter/lucene
+[WITH DEFERRED REFRESH]
+[PROPERTIES ('key'='value')]
 ```
 
-Currently, there are 5 DataMap implementations in CarbonData.
+Currently, there are 3 Index implementations in CarbonData.
 
-| DataMap Provider | Description                              | DMPROPERTIES   
                          | Management       |
-| ---------------- | ---------------------------------------- | 
---------------------------------------- | ---------------- |
-| mv               | multi-table pre-aggregate table          | No DMPROPERTY 
is required                | Manual/Automatic           |
-| lucene           | lucene indexing for text column          | index_columns 
to specifying the index columns | Automatic |
-| bloomfilter      | bloom filter for high cardinality column, geospatial 
column | index_columns to specifying the index columns | Automatic |
+| Index Provider   | Description                                               
                       | Management |
+| ---------------- | 
--------------------------------------------------------------------------------
 |  --------- |
+| secondary-index  | secondary-index tables to hold blocklets as indexes and 
managed as child tables  | Automatic |
+| lucene           | lucene indexing for text column                           
                       | Automatic |
+| bloomfilter      | bloom filter for high cardinality column, geospatial 
column                      | Automatic |
 
-## DataMap Management
+## Index Management
 
-There are two kinds of management semantic for DataMap.
+There are two kinds of management semantic for Index.
 
-1. Automatic Refresh: Create datamap without `WITH DEFERRED REBUILD` in the 
statement, which is by default.
-2. Manual Refresh: Create datamap with `WITH DEFERRED REBUILD` in the statement
+1. Automatic Refresh: Create index without `WITH DEFERRED REBUILD` in the 
statement, which is by default.
+2. Manual Refresh: Create index with `WITH DEFERRED REBUILD` in the statement
 
 ### Automatic Refresh
 
-When user creates a datamap on the main table without using `WITH DEFERRED 
REBUILD` syntax, the datamap will be managed by system automatically.
-For every data load to the main table, system will immediately trigger a load 
to the datamap automatically. These two data loading (to main table and 
datamap) is executed in a transactional manner, meaning that it will be either 
both success or neither success. 
+When user creates a index on the main table without using `WITH DEFERRED 
REFRESH` syntax, the index will be managed by system automatically.
+For every data load to the main table, system will immediately trigger a load 
to the index automatically. These two data loading (to main table and index) is 
executed in a transactional manner, meaning that it will be either both success 
or neither success. 
 
-The data loading to datamap is incremental based on Segment concept, avoiding 
a expensive total rebuild.
+The data loading to index is incremental based on Segment concept, avoiding a 
expensive total rebuild.
 
 If user perform following command on the main table, system will return 
failure. (reject the operation)
 
 1. Data management command: `UPDATE/DELETE/DELETE SEGMENT`.
 2. Schema management command: `ALTER TABLE DROP COLUMN`, `ALTER TABLE CHANGE 
DATATYPE`,
    `ALTER TABLE RENAME`. Note that adding a new column is supported, and for 
dropping columns and
-   change datatype command, CarbonData will check whether it will impact the 
pre-aggregate table, if
+   change datatype command, CarbonData will check whether it will impact the 
index table, if
     not, the operation is allowed, otherwise operation will be rejected by 
throwing exception.
 3. Partition management command: `ALTER TABLE ADD/DROP PARTITION`.
 
-If user do want to perform above operations on the main table, user can first 
drop the datamap, perform the operation, and re-create the datamap again.
+If user do want to perform above operations on the main table, user can first 
drop the index, perform the operation, and re-create the index again.
 
-If user drop the main table, the datamap will be dropped immediately too.
+If user drop the main table, the index will be dropped immediately too.
 
-We do recommend you to use this management for index datamap.
+We do recommend you to use this management for index.
 
 ### Manual Refresh
 
-When user creates a datamap specifying manual refresh semantic, the datamap is 
created with status *disabled* and query will NOT use this datamap until user 
can issue REBUILD DATAMAP command to build the datamap. For every REBUILD 
DATAMAP command, system will trigger a full rebuild of the datamap. After 
rebuild is done, system will change datamap status to *enabled*, so that it can 
be used in query rewrite.
+When user creates a index specifying manual refresh semantic, the index is 
created with status *disabled* and query will NOT use this index until user can 
issue REFRESH INDEX command to build the index. For every REFRESH INDEX 
command, system will trigger a full rebuild of the index. After rebuild is 
done, system will change index status to *enabled*, so that it can be used in 
query rewrite.
 
-For every new data loading, data update, delete, the related datamap will be 
made *disabled*,
-which means that the following queries will not benefit from the datamap 
before it becomes *enabled* again.
+For every new data loading, data update, delete, the related index will be 
made *disabled*,
+which means that the following queries will not benefit from the index before 
it becomes *enabled* again.
 
-If the main table is dropped by user, the related datamap will be dropped 
immediately.
+If the main table is dropped by user, the related index will be dropped 
immediately.
 
 **Note**:
-+ If you are creating a datamap on external table, you need to do manual 
management of the datamap.
-+ For index datamap such as BloomFilter datamap, there is no need to do manual 
refresh.
++ If you are creating a index on external table, you need to do manual 
management of the index.
++ For index such as BloomFilter index, there is no need to do manual refresh.
  By default it is automatic refresh,
- which means its data will get refreshed immediately after the datamap is 
created or the main table is loaded.
- Manual refresh on this datamap will has no impact.
+ which means its data will get refreshed immediately after the index is 
created or the main table is loaded.
+ Manual refresh on this index will has no impact.
 
-
-
-## DataMap Catalog
-
-Currently, when user creates a datamap, system will store the datamap metadata 
in a configurable *system* folder in HDFS or S3.
-
-In this *system* folder, it contains:
-
-- DataMapSchema file. It is a json file containing schema for one datamap. Ses 
DataMapSchema class. If user creates 100 datamaps (on different tables), there 
will be 100 files in *system* folder.
-- DataMapStatus file. Only one file, it is in json format, and each entry in 
the file represents for one datamap. Ses DataMapStatusDetail class
-
-There is a DataMapCatalog interface to retrieve schema of all datamap, it can 
be used in optimizer to get the metadata of datamap.
-
-
-
-## DataMap Related Commands
+## Index Related Commands
 
 ### Explain
 
-How can user know whether datamap is used in the query?
+How can user know whether index is used in the query?
 
 User can set enable.query.statistics = true and use EXPLAIN command to know, 
it will print out something like
 
 ```text
 == CarbonData Profiler ==
-Hit mv DataMap: datamap1
-Scan Table: default.datamap1_table
+Table Scan on default.main
++- total: 1 blocks, 1 blocklets
 +- filter:
-+- pruning by CG DataMap
-+- all blocklets: 1
-   skipped blocklets: 0
++- pruned by CG Index
+   - name: index1
+   - provider: lucene
+   - skipped: 0 blocks, 0 blocklets
 ```
 
-### Show DataMap
+### Show Index
 
-There is a SHOW DATAMAPS command, when this is issued, system will read all 
datamap from *system* folder and print all information on screen. The current 
information includes:
+There is a SHOW INDEXES command, when this is issued, system will read all 
index from the carbon table and print all information on screen. The current 
information includes:
 
-- DataMapName
-- DataMapProviderName like mv
-- Associated Table
-- DataMap Properties
-- DataMap status (ENABLED/DISABLED)
-- Sync Status - which displays Last segment Id of main table synced with 
datamap table and its load
-  end time (Applicable only for mv datamap)
+- Name
+- Provider like lucene
+- Indexed Columns
+- Properties
+- Status (ENABLED/DISABLED)
+- Sync Info - which displays Last segment Id of main table synced with index 
table and its load
+  end time
diff --git a/docs/index/lucene-index-guide.md b/docs/index/lucene-index-guide.md
index d12aa47..c811ec3 100644
--- a/docs/index/lucene-index-guide.md
+++ b/docs/index/lucene-index-guide.md
@@ -15,46 +15,47 @@
     limitations under the License.
 -->
 
-# CarbonData Lucene DataMap (Alpha Feature)
+# CarbonData Lucene Index (Alpha Feature)
   
-* [DataMap Management](#datamap-management)
-* [Lucene Datamap](#lucene-datamap-introduction)
+* [Index Management](#index-management)
+* [Lucene Index](#lucene-index-introduction)
 * [Loading Data](#loading-data)
 * [Querying Data](#querying-data)
-* [Data Management](#data-management-with-lucene-datamap)
+* [Data Management](#data-management-with-lucene-index)
 
-#### DataMap Management 
-Lucene DataMap can be created using following DDL
+#### Index Management 
+Lucene Index can be created using following DDL
   ```
-  CREATE DATAMAP [IF NOT EXISTS] datamap_name
-  ON TABLE main_table
-  USING 'lucene'
-  DMPROPERTIES ('index_columns'='city, name', ...)
+  CREATE INDEX [IF NOT EXISTS] index_name
+  ON TABLE main_table (index_columns)
+  AS 'lucene'
+  [PROPERTIES ('key'='value')]
   ```
+index_columns is the list of string columns on which lucene creates indexes.
 
-DataMap can be dropped using following DDL:
+Index can be dropped using following DDL:
   ```
-  DROP DATAMAP [IF EXISTS] datamap_name
+  DROP INDEX [IF EXISTS] index_name
   ON TABLE main_table
   ```
-To show all DataMaps created, use:
+To show all Indexes created, use:
   ```
-  SHOW DATAMAP 
+  SHOW INDEXES
   ON TABLE main_table
   ```
-It will show all DataMaps created on main table.
+It will show all Indexes created on main table.
 
 
-## Lucene DataMap Introduction
+## Lucene Index Introduction
   Lucene is a high performance, full featured text search engine. Lucene is 
integrated to carbon as
-  an index datamap and managed along with main tables by CarbonData. User can 
create lucene datamap 
+  an index and managed along with main tables by CarbonData. User can create 
lucene index 
   to improve query performance on string columns which has content of more 
length. So, user can 
   search tokenized word or pattern of it using lucene query on text content.
   
-  For instance, main table called **datamap_test** which is defined as:
+  For instance, main table called **index_test** which is defined as:
   
   ```
-  CREATE TABLE datamap_test (
+  CREATE TABLE index_test (
     name string,
     age int,
     city string,
@@ -62,28 +63,26 @@ It will show all DataMaps created on main table.
   STORED AS carbondata
   ```
   
-  User can create Lucene datamap using the Create DataMap DDL:
+  User can create Lucene index using the Create Index DDL:
   
   ```
-  CREATE DATAMAP dm
-  ON TABLE datamap_test
-  USING 'lucene'
-  DMPROPERTIES ('INDEX_COLUMNS' = 'name, country',)
+  CREATE INDEX dm
+  ON TABLE index_test (name,country)
+  AS 'lucene'
   ```
 
-**DMProperties**
-1. INDEX_COLUMNS: The list of string columns on which lucene creates indexes.
-2. FLUSH_CACHE: size of the cache to maintain in Lucene writer, if specified 
then it tries to 
+**Properties**
+1. FLUSH_CACHE: size of the cache to maintain in Lucene writer, if specified 
then it tries to 
    aggregate the unique data till the cache limit and flush to Lucene. It is 
best suitable for low 
    cardinality dimensions.
-3. SPLIT_BLOCKLET: when made as true then store the data in blocklet wise in 
lucene , it means new 
+2. SPLIT_BLOCKLET: when made as true then store the data in blocklet wise in 
lucene , it means new 
    folder will be created for each blocklet, thus, it eliminates storing 
blockletid in lucene and 
    also it makes lucene small chunks of data.
    
 ## Loading data
 When loading data to main table, lucene index files will be generated for all 
the
-index_columns(String Columns) given in DMProperties which contains information 
about the data
-location of index_columns. These index files will be written inside a folder 
named with datamap name
+index_columns(String Columns) given in CREATE statement which contains 
information about the data
+location of index_columns. These index files will be written inside a folder 
named with index name
 inside each segment folders.
 
 A system level configuration carbon.lucene.compression.mode can be added for 
best compression of
@@ -99,7 +98,7 @@ fired, two jobs are fired. The first job writes the temporary 
files in folder cr
 which contains lucene's seach results and these files will be read in second 
job to give faster 
 results. These temporary files will be cleared once the query finishes.
 
-User can verify whether a query can leverage Lucene datamap or not by 
executing `EXPLAIN`
+User can verify whether a query can leverage Lucene index or not by executing 
`EXPLAIN`
 command, which will show the transformed logical plan, and thus user can check 
whether TEXT_MATCH()
 filter is applied on query or not.
 
@@ -109,50 +108,50 @@ filter condition like 'AND','OR' must be in upper case.
 
       Ex: 
       ```
-      select * from datamap_test where TEXT_MATCH('name:*10 AND name:*n*')
+      select * from index_test where TEXT_MATCH('name:*10 AND name:*n*')
       ```
      
 2. Query supports only one TEXT_MATCH udf for filter condition and not 
multiple udfs.
 
    The following query is supported:
    ```
-   select * from datamap_test where TEXT_MATCH('name:*10 AND name:*n*')
+   select * from index_test where TEXT_MATCH('name:*10 AND name:*n*')
    ```
        
    The following query is not supported:
    ```
-   select * from datamap_test where TEXT_MATCH('name:*10) AND 
TEXT_MATCH(name:*n*')
+   select * from index_test where TEXT_MATCH('name:*10) AND 
TEXT_MATCH(name:*n*')
    ```
        
           
 Below like queries can be converted to text_match queries as following:
 ```
-select * from datamap_test where name='n10'
+select * from index_test where name='n10'
 
-select * from datamap_test where name like 'n1%'
+select * from index_test where name like 'n1%'
 
-select * from datamap_test where name like '%10'
+select * from index_test where name like '%10'
 
-select * from datamap_test where name like '%n%'
+select * from index_test where name like '%n%'
 
-select * from datamap_test where name like '%10' and name not like '%n%'
+select * from index_test where name like '%10' and name not like '%n%'
 ```
 Lucene TEXT_MATCH Queries:
 ```
-select * from datamap_test where TEXT_MATCH('name:n10')
+select * from index_test where TEXT_MATCH('name:n10')
 
-select * from datamap_test where TEXT_MATCH('name:n1*')
+select * from index_test where TEXT_MATCH('name:n1*')
 
-select * from datamap_test where TEXT_MATCH('name:*10')
+select * from index_test where TEXT_MATCH('name:*10')
 
-select * from datamap_test where TEXT_MATCH('name:*n*')
+select * from index_test where TEXT_MATCH('name:*n*')
 
-select * from datamap_test where TEXT_MATCH('name:*10 -name:*n*')
+select * from index_test where TEXT_MATCH('name:*10 -name:*n*')
 ```
 **Note:** For lucene queries and syntax, refer to 
[lucene-syntax](http://www.lucenetutorial.com/lucene-query-syntax.html)
 
-## Data Management with lucene datamap
-Once there is lucene datamap is created on the main table, following command 
on the main
+## Data Management with lucene index
+Once there is lucene index is created on the main table, following command on 
the main
 table
 is not supported:
 1. Data management command: `UPDATE/DELETE`.
diff --git a/docs/language-manual.md b/docs/language-manual.md
index d8f30b0..9a4a79b 100644
--- a/docs/language-manual.md
+++ b/docs/language-manual.md
@@ -27,7 +27,8 @@ CarbonData has its own parser, in addition to Spark's SQL 
Parser, to parse and p
   - [Index](./index/index-management.md)
     - [Bloom](./index/bloomfilter-index-guide.md)
     - [Lucene](./index/lucene-index-guide.md)
-  - Materialized Views (MV)
+    - [Secondary-index](./index/secondary-index-guide.md)
+  - [Materialized Views (MV)](./index/mv-guide.md)
   - [Streaming](./streaming-guide.md)
 - Data Manipulation Statements
   - [DML:](./dml-of-carbondata.md) [Load](./dml-of-carbondata.md#load-data), 
[Insert](./dml-of-carbondata.md#insert-data-into-carbondata-table), 
[Update](./dml-of-carbondata.md#update), [Delete](./dml-of-carbondata.md#delete)

[carbondata] branch master updated: [CARBONDATA-3772] Update index documents

Reply via email to