[40/45] carbondata git commit: [Documentation] Readme updated with latest topics and new TOC

ravipesala Tue, 09 Oct 2018 08:50:25 -0700

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/datamap-developer-guide.md
----------------------------------------------------------------------
diff --git a/docs/datamap-developer-guide.md b/docs/datamap-developer-guide.md
index 6bac9b5..60f93df 100644
--- a/docs/datamap-developer-guide.md
+++ b/docs/datamap-developer-guide.md
@@ -6,7 +6,7 @@ Currently, there are two 2 types of DataMap supported:
 1. IndexDataMap: DataMap that leverages index to accelerate filter query
 2. MVDataMap: DataMap that leverages Materialized View to accelerate olap 
style query, like SPJG query (select, predicate, join, groupby)
 
-### DataMap provider
+### DataMap Provider
 When user issues `CREATE DATAMAP dm ON TABLE main USING 'provider'`, the 
corresponding DataMapProvider implementation will be created and initialized. 
 Currently, the provider string can be:
 1. preaggregate: A type of MVDataMap that do pre-aggregate of single table
@@ -15,5 +15,5 @@ Currently, the provider string can be:
 
 When user issues `DROP DATAMAP dm ON TABLE main`, the corresponding 
DataMapProvider interface will be called.
 
-Details about [DataMap 
Management](./datamap/datamap-management.md#datamap-management) and supported 
[DSL](./datamap/datamap-management.md#overview) are documented 
[here](./datamap/datamap-management.md).
+Click for more details about [DataMap 
Management](./datamap/datamap-management.md#datamap-management) and supported 
[DSL](./datamap/datamap-management.md#overview).


http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/datamap/bloomfilter-datamap-guide.md
----------------------------------------------------------------------
diff --git a/docs/datamap/bloomfilter-datamap-guide.md 
b/docs/datamap/bloomfilter-datamap-guide.md
index b2e7d60..fb244fe 100644
--- a/docs/datamap/bloomfilter-datamap-guide.md
+++ b/docs/datamap/bloomfilter-datamap-guide.md
@@ -15,7 +15,7 @@
     limitations under the License.
 -->
 
-# CarbonData BloomFilter DataMap (Alpha Feature)
+# CarbonData BloomFilter DataMap
 
 * [DataMap Management](#datamap-management)
 * [BloomFilter Datamap Introduction](#bloomfilter-datamap-introduction)
@@ -46,7 +46,7 @@ Showing all DataMaps on this table
   ```
 
 Disable Datamap
-> The datamap by default is enabled. To support tuning on query, we can 
disable a specific datamap during query to observe whether we can gain 
performance enhancement from it. This will only take effect current session.
+> The datamap by default is enabled. To support tuning on query, we can 
disable a specific datamap during query to observe whether we can gain 
performance enhancement from it. This is effective only for current session.
 
   ```
   // disable the datamap
@@ -82,7 +82,7 @@ and we always query on `id` and `name` with precise value.
 since `id` is in the sort_columns and it is orderd,
 query on it will be fast because CarbonData can skip all the irrelative 
blocklets.
 But queries on `name` may be bad since the blocklet minmax may not help,
-because in each blocklet the range of the value of `name` may be the same -- 
all from A*~z*.
+because in each blocklet the range of the value of `name` may be the same -- 
all from A* to z*.
 In this case, user can create a BloomFilter datamap on column `name`.
 Moreover, user can also create a BloomFilter datamap on the sort_columns.
 This is useful if user has too many segments and the range of the value of 
sort_columns are almost the same.

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/datamap/datamap-management.md
----------------------------------------------------------------------
diff --git a/docs/datamap/datamap-management.md 
b/docs/datamap/datamap-management.md
index bf52c05..ad8718a 100644
--- a/docs/datamap/datamap-management.md
+++ b/docs/datamap/datamap-management.md
@@ -66,9 +66,9 @@ If user create MV datamap without specifying `WITH DEFERRED 
REBUILD`, carbondata
 ### Automatic Refresh
 
 When user creates a datamap on the main table without using `WITH DEFERRED 
REBUILD` syntax, the datamap will be managed by system automatically.
-For every data load to the main table, system will immediately triger a load 
to the datamap automatically. These two data loading (to main table and 
datamap) is executed in a transactional manner, meaning that it will be either 
both success or neither success. 
+For every data load to the main table, system will immediately trigger a load 
to the datamap automatically. These two data loading (to main table and 
datamap) is executed in a transactional manner, meaning that it will be either 
both success or neither success. 
 
-The data loading to datamap is incremental based on Segment concept, avoiding 
a expesive total rebuild.
+The data loading to datamap is incremental based on Segment concept, avoiding 
a expensive total rebuild.
 
 If user perform following command on the main table, system will return 
failure. (reject the operation)
 
@@ -87,7 +87,7 @@ We do recommend you to use this management for index datamap.
 
 ### Manual Refresh
 
-When user creates a datamap specifying maunal refresh semantic, the datamap is 
created with status *disabled* and query will NOT use this datamap until user 
can issue REBUILD DATAMAP command to build the datamap. For every REBUILD 
DATAMAP command, system will trigger a full rebuild of the datamap. After 
rebuild is done, system will change datamap status to *enabled*, so that it can 
be used in query rewrite.
+When user creates a datamap specifying manual refresh semantic, the datamap is 
created with status *disabled* and query will NOT use this datamap until user 
can issue REBUILD DATAMAP command to build the datamap. For every REBUILD 
DATAMAP command, system will trigger a full rebuild of the datamap. After 
rebuild is done, system will change datamap status to *enabled*, so that it can 
be used in query rewrite.
 
 For every new data loading, data update, delete, the related datamap will be 
made *disabled*,
 which means that the following queries will not benefit from the datamap 
before it becomes *enabled* again.

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/datamap/lucene-datamap-guide.md
----------------------------------------------------------------------
diff --git a/docs/datamap/lucene-datamap-guide.md 
b/docs/datamap/lucene-datamap-guide.md
index 86b00e2..aa9c8d4 100644
--- a/docs/datamap/lucene-datamap-guide.md
+++ b/docs/datamap/lucene-datamap-guide.md
@@ -47,7 +47,7 @@ It will show all DataMaps created on main table.
 
 ## Lucene DataMap Introduction
   Lucene is a high performance, full featured text search engine. Lucene is 
integrated to carbon as
-  an index datamap and managed along with main tables by CarbonData.User can 
create lucene datamap 
+  an index datamap and managed along with main tables by CarbonData. User can 
create lucene datamap 
   to improve query performance on string columns which has content of more 
length. So, user can 
   search tokenized word or pattern of it using lucene query on text content.
   
@@ -95,7 +95,7 @@ As a technique for query acceleration, Lucene indexes cannot 
be queried directly
 Queries are to be made on main table. when a query with TEXT_MATCH('name:c10') 
or 
 TEXT_MATCH_WITH_LIMIT('name:n10',10)[the second parameter represents the 
number of result to be 
 returned, if user does not specify this value, all results will be returned 
without any limit] is 
-fired, two jobs are fired.The first job writes the temporary files in folder 
created at table level 
+fired, two jobs are fired. The first job writes the temporary files in folder 
created at table level 
 which contains lucene's seach results and these files will be read in second 
job to give faster 
 results. These temporary files will be cleared once the query finishes.
 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/datamap/preaggregate-datamap-guide.md
----------------------------------------------------------------------
diff --git a/docs/datamap/preaggregate-datamap-guide.md 
b/docs/datamap/preaggregate-datamap-guide.md
index 3a3efc2..eff601d 100644
--- a/docs/datamap/preaggregate-datamap-guide.md
+++ b/docs/datamap/preaggregate-datamap-guide.md
@@ -251,7 +251,7 @@ pre-aggregate tables. To further improve the query 
performance, compaction on pr
 can be triggered to merge the segments and files in the pre-aggregate tables. 
 
 ## Data Management with pre-aggregate tables
-In current implementation, data consistence need to be maintained for both 
main table and pre-aggregate
+In current implementation, data consistency needs to be maintained for both 
main table and pre-aggregate
 tables. Once there is pre-aggregate table created on the main table, following 
command on the main 
 table
 is not supported:

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/ddl-of-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/ddl-of-carbondata.md b/docs/ddl-of-carbondata.md
index 7cda9cd..22d754a 100644
--- a/docs/ddl-of-carbondata.md
+++ b/docs/ddl-of-carbondata.md
@@ -73,6 +73,7 @@ CarbonData DDL statements are documented here,which includes:
   [TBLPROPERTIES (property_name=property_value, ...)]
   [LOCATION 'path']
   ```
+  
   **NOTE:** CarbonData also supports "STORED AS carbondata" and "USING 
carbondata". Find example code at 
[CarbonSessionExample](https://github.com/apache/carbondata/blob/master/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSessionExample.scala)
 in the CarbonData repo.
 ### Usage Guidelines
 
@@ -93,10 +94,10 @@ CarbonData DDL statements are documented here,which 
includes:
 | [streaming](#streaming)                                      | Whether the 
table is a streaming table                       |
 | [LOCAL_DICTIONARY_ENABLE](#local-dictionary-configuration)   | Enable local 
dictionary generation                           |
 | [LOCAL_DICTIONARY_THRESHOLD](#local-dictionary-configuration) | Cardinality 
upto which the local dictionary can be generated |
-| [LOCAL_DICTIONARY_INCLUDE](#local-dictionary-configuration)  | Columns for 
which local dictionary needs to be generated.Useful when local dictionary need 
not be generated for all string/varchar/char columns |
-| [LOCAL_DICTIONARY_EXCLUDE](#local-dictionary-configuration)  | Columns for 
which local dictionary generation should be skipped.Useful when local 
dictionary need not be generated for few string/varchar/char columns |
+| [LOCAL_DICTIONARY_INCLUDE](#local-dictionary-configuration)  | Columns for 
which local dictionary needs to be generated. Useful when local dictionary need 
not be generated for all string/varchar/char columns |
+| [LOCAL_DICTIONARY_EXCLUDE](#local-dictionary-configuration)  | Columns for 
which local dictionary generation should be skipped. Useful when local 
dictionary need not be generated for few string/varchar/char columns |
 | [COLUMN_META_CACHE](#caching-minmax-value-for-required-columns) | Columns 
whose metadata can be cached in Driver for efficient pruning and improved query 
performance |
-| [CACHE_LEVEL](#caching-at-block-or-blocklet-level)           | Column 
metadata caching level.Whether to cache column metadata of block or blocklet |
+| [CACHE_LEVEL](#caching-at-block-or-blocklet-level)           | Column 
metadata caching level. Whether to cache column metadata of block or blocklet |
 | [flat_folder](#support-flat-folder-same-as-hiveparquet)      | Whether to 
write all the carbondata files in a single folder.Not writing segments folder 
during incremental load |
 | [LONG_STRING_COLUMNS](#string-longer-than-32000-characters)  | Columns which 
are greater than 32K characters                |
 | [BUCKETNUMBER](#bucketing)                                   | Number of 
buckets to be created                              |
@@ -111,9 +112,10 @@ CarbonData DDL statements are documented here,which 
includes:
 
      ```
      TBLPROPERTIES ('DICTIONARY_INCLUDE'='column1, column2')
-       ```
-        NOTE: Dictionary Include/Exclude for complex child columns is not 
supported.
-       
+     ```
+     
+     **NOTE**: Dictionary Include/Exclude for complex child columns is not 
supported.
+     
    - ##### Inverted Index Configuration
 
      By default inverted index is enabled, it might help to improve 
compression ratio and query speed, especially for low cardinality columns which 
are in reward position.
@@ -128,7 +130,7 @@ CarbonData DDL statements are documented here,which 
includes:
      This property is for users to specify which columns belong to the 
MDK(Multi-Dimensions-Key) index.
      * If users don't specify "SORT_COLUMN" property, by default MDK index be 
built by using all dimension columns except complex data type column. 
      * If this property is specified but with empty argument, then the table 
will be loaded without sort.
-        * This supports only string, date, timestamp, short, int, long, byte 
and boolean data types.
+     * This supports only string, date, timestamp, short, int, long, byte and 
boolean data types.
      Suggested use cases : Only build MDK index for required columns,it might 
help to improve the data loading performance.
 
      ```
@@ -136,7 +138,8 @@ CarbonData DDL statements are documented here,which 
includes:
      OR
      TBLPROPERTIES ('SORT_COLUMNS'='')
      ```
-     NOTE: Sort_Columns for Complex datatype columns is not supported.
+     
+     **NOTE**: Sort_Columns for Complex datatype columns is not supported.
 
    - ##### Sort Scope Configuration
    
@@ -147,10 +150,10 @@ CarbonData DDL statements are documented here,which 
includes:
      * BATCH_SORT: It increases the load performance but decreases the query 
performance if identified blocks > parallelism.
      * GLOBAL_SORT: It increases the query performance, especially high 
concurrent point query.
        And if you care about loading resources isolation strictly, because the 
system uses the spark GroupBy to sort data, the resource can be controlled by 
spark. 
-       
-       ### Example:
 
-   ```
+    ### Example:
+
+    ```
     CREATE TABLE IF NOT EXISTS productSchema.productSalesTable (
       productNumber INT,
       productName STRING,
@@ -163,7 +166,7 @@ CarbonData DDL statements are documented here,which 
includes:
     STORED AS carbondata
     TBLPROPERTIES ('SORT_COLUMNS'='productName,storeCity',
                    'SORT_SCOPE'='NO_SORT')
-   ```
+    ```
 
    **NOTE:** CarbonData also supports "using carbondata". Find example code at 
[SparkSessionExample](https://github.com/apache/carbondata/blob/master/examples/spark2/src/main/scala/org/apache/carbondata/examples/SparkSessionExample.scala)
 in the CarbonData repo.
 
@@ -174,6 +177,7 @@ CarbonData DDL statements are documented here,which 
includes:
      ```
      TBLPROPERTIES ('TABLE_BLOCKSIZE'='512')
      ```
+ 
      **NOTE:** 512 or 512M both are accepted.
 
    - ##### Table Compaction Configuration
@@ -197,7 +201,7 @@ CarbonData DDL statements are documented here,which 
includes:
      
    - ##### Streaming
 
-     CarbonData supports streaming ingestion for real-time data. You can 
create the âstreamingâ table using the following table properties.
+     CarbonData supports streaming ingestion for real-time data. You can 
create the 'streaming' table using the following table properties.
 
      ```
      TBLPROPERTIES ('streaming'='true')
@@ -247,8 +251,8 @@ CarbonData DDL statements are documented here,which 
includes:
 | ---------- | ------------- | ----------- |
 | LOCAL_DICTIONARY_ENABLE | false | Whether to enable local dictionary 
generation. **NOTE:** If this property is defined, it will override the value 
configured at system level by '***carbon.local.dictionary.enable***'.Local 
dictionary will be generated for all string/varchar/char columns unless 
LOCAL_DICTIONARY_INCLUDE, LOCAL_DICTIONARY_EXCLUDE is configured. |
 | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality of a column 
upto which carbondata can try to generate local dictionary (maximum - 100000). 
**NOTE:** When LOCAL_DICTIONARY_THRESHOLD is defined for Complex columns, the 
count of distinct records of all child columns are summed up. |
-| LOCAL_DICTIONARY_INCLUDE | string/varchar/char columns| Columns for which 
Local Dictionary has to be generated.**NOTE:** Those string/varchar/char 
columns which are added into DICTIONARY_INCLUDE option will not be considered 
for local dictionary generation.This property needs to be configured only when 
local dictionary needs to be generated for few columns, skipping others.This 
property takes effect only when **LOCAL_DICTIONARY_ENABLE** is true or 
**carbon.local.dictionary.enable** is true |
-| LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary need 
not be generated.This property needs to be configured only when local 
dictionary needs to be skipped for few columns, generating for others.This 
property takes effect only when **LOCAL_DICTIONARY_ENABLE** is true or 
**carbon.local.dictionary.enable** is true |
+| LOCAL_DICTIONARY_INCLUDE | string/varchar/char columns| Columns for which 
Local Dictionary has to be generated.**NOTE:** Those string/varchar/char 
columns which are added into DICTIONARY_INCLUDE option will not be considered 
for local dictionary generation. This property needs to be configured only when 
local dictionary needs to be generated for few columns, skipping others. This 
property takes effect only when **LOCAL_DICTIONARY_ENABLE** is true or 
**carbon.local.dictionary.enable** is true |
+| LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary need 
not be generated. This property needs to be configured only when local 
dictionary needs to be skipped for few columns, generating for others. This 
property takes effect only when **LOCAL_DICTIONARY_ENABLE** is true or 
**carbon.local.dictionary.enable** is true |
 
    **Fallback behavior:** 
 
@@ -294,19 +298,19 @@ CarbonData DDL statements are documented here,which 
includes:
       * If you want no column min/max values to be cached in the driver.
 
       ```
-      COLUMN_META_CACHE=ââ
+      COLUMN_META_CACHE=''
       ```
 
       * If you want only col1 min/max values to be cached in the driver.
 
       ```
-      COLUMN_META_CACHE=âcol1â
+      COLUMN_META_CACHE='col1'
       ```
 
       * If you want min/max values to be cached in driver for all the 
specified columns.
 
       ```
-      COLUMN_META_CACHE=âcol1,col2,col3,â¦â
+      COLUMN_META_CACHE='col1,col2,col3,â¦'
       ```
 
       Columns to be cached can be specified either while creating table or 
after creation of the table.
@@ -315,13 +319,13 @@ CarbonData DDL statements are documented here,which 
includes:
       Syntax:
 
       ```
-      CREATE TABLE [dbName].tableName (col1 String, col2 String, col3 int,â¦) 
STORED BY âcarbondataâ TBLPROPERTIES 
(âCOLUMN_META_CACHEâ=âcol1,col2,â¦â)
+      CREATE TABLE [dbName].tableName (col1 String, col2 String, col3 int,â¦) 
STORED BY 'carbondata' TBLPROPERTIES ('COLUMN_META_CACHE'='col1,col2,â¦')
       ```
 
       Example:
 
       ```
-      CREATE TABLE employee (name String, city String, id int) STORED BY 
âcarbondataâ TBLPROPERTIES (âCOLUMN_META_CACHEâ=ânameâ)
+      CREATE TABLE employee (name String, city String, id int) STORED BY 
'carbondata' TBLPROPERTIES ('COLUMN_META_CACHE'='name')
       ```
 
       After creation of table or on already created tables use the alter table 
command to configure the columns to be cached.
@@ -329,13 +333,13 @@ CarbonData DDL statements are documented here,which 
includes:
       Syntax:
 
       ```
-      ALTER TABLE [dbName].tableName SET TBLPROPERTIES 
(âCOLUMN_META_CACHEâ=âcol1,col2,â¦â)
+      ALTER TABLE [dbName].tableName SET TBLPROPERTIES 
('COLUMN_META_CACHE'='col1,col2,â¦')
       ```
 
       Example:
 
       ```
-      ALTER TABLE employee SET TBLPROPERTIES 
(âCOLUMN_META_CACHEâ=âcityâ)
+      ALTER TABLE employee SET TBLPROPERTIES ('COLUMN_META_CACHE'='city')
       ```
 
    - ##### Caching at Block or Blocklet Level
@@ -347,13 +351,13 @@ CarbonData DDL statements are documented here,which 
includes:
       *Configuration for caching in driver at Block level (default value).*
 
       ```
-      CACHE_LEVEL= âBLOCKâ
+      CACHE_LEVEL= 'BLOCK'
       ```
 
       *Configuration for caching in driver at Blocklet level.*
 
       ```
-      CACHE_LEVEL= âBLOCKLETâ
+      CACHE_LEVEL= 'BLOCKLET'
       ```
 
       Cache level can be specified either while creating table or after 
creation of the table.
@@ -362,13 +366,13 @@ CarbonData DDL statements are documented here,which 
includes:
       Syntax:
 
       ```
-      CREATE TABLE [dbName].tableName (col1 String, col2 String, col3 int,â¦) 
STORED BY âcarbondataâ TBLPROPERTIES (âCACHE_LEVELâ=âBlockletâ)
+      CREATE TABLE [dbName].tableName (col1 String, col2 String, col3 int,â¦) 
STORED BY 'carbondata' TBLPROPERTIES ('CACHE_LEVEL'='Blocklet')
       ```
 
       Example:
 
       ```
-      CREATE TABLE employee (name String, city String, id int) STORED BY 
âcarbondataâ TBLPROPERTIES (âCACHE_LEVELâ=âBlockletâ)
+      CREATE TABLE employee (name String, city String, id int) STORED BY 
'carbondata' TBLPROPERTIES ('CACHE_LEVEL'='Blocklet')
       ```
 
       After creation of table or on already created tables use the alter table 
command to configure the cache level.
@@ -376,26 +380,27 @@ CarbonData DDL statements are documented here,which 
includes:
       Syntax:
 
       ```
-      ALTER TABLE [dbName].tableName SET TBLPROPERTIES 
(âCACHE_LEVELâ=âBlockletâ)
+      ALTER TABLE [dbName].tableName SET TBLPROPERTIES 
('CACHE_LEVEL'='Blocklet')
       ```
 
       Example:
 
       ```
-      ALTER TABLE employee SET TBLPROPERTIES (âCACHE_LEVELâ=âBlockletâ)
+      ALTER TABLE employee SET TBLPROPERTIES ('CACHE_LEVEL'='Blocklet')
       ```
 
    - ##### Support Flat folder same as Hive/Parquet
 
-       This feature allows all carbondata and index files to keep directy 
under tablepath. Currently all carbondata/carbonindex files written under 
tablepath/Fact/Part0/Segment_NUM folder and it is not same as hive/parquet 
folder structure. This feature makes all files written will be directly under 
tablepath, it does not maintain any segment folder structure.This is useful for 
interoperability between the execution engines and plugin with other execution 
engines like hive or presto becomes easier.
+       This feature allows all carbondata and index files to keep directy 
under tablepath. Currently all carbondata/carbonindex files written under 
tablepath/Fact/Part0/Segment_NUM folder and it is not same as hive/parquet 
folder structure. This feature makes all files written will be directly under 
tablepath, it does not maintain any segment folder structure. This is useful 
for interoperability between the execution engines and plugin with other 
execution engines like hive or presto becomes easier.
 
        Following table property enables this feature and default value is 
false.
        ```
         'flat_folder'='true'
        ```
+   
        Example:
        ```
-       CREATE TABLE employee (name String, city String, id int) STORED BY 
âcarbondataâ TBLPROPERTIES ('flat_folder'='true')
+       CREATE TABLE employee (name String, city String, id int) STORED BY 
'carbondata' TBLPROPERTIES ('flat_folder'='true')
        ```
 
    - ##### String longer than 32000 characters
@@ -489,7 +494,7 @@ CarbonData DDL statements are documented here,which 
includes:
   This function allows user to create external table by specifying location.
   ```
   CREATE EXTERNAL TABLE [IF NOT EXISTS] [db_name.]table_name 
-  STORED AS carbondata LOCATION â$FilesPathâ
+  STORED AS carbondata LOCATION '$FilesPath'
   ```
 
 ### Create external table on managed table data location.
@@ -542,7 +547,7 @@ CarbonData DDL statements are documented here,which 
includes:
 
 ### Example
   ```
-  CREATE DATABASE carbon LOCATION âhdfs://name_cluster/dir1/carbonstoreâ;
+  CREATE DATABASE carbon LOCATION "hdfs://name_cluster/dir1/carbonstore";
   ```
 
 ## TABLE MANAGEMENT  
@@ -600,21 +605,23 @@ CarbonData DDL statements are documented here,which 
includes:
      ```
      ALTER TABLE carbon ADD COLUMNS (a1 INT, b1 STRING) 
TBLPROPERTIES('DEFAULT.VALUE.a1'='10')
      ```
-      NOTE: Add Complex datatype columns is not supported.
+      **NOTE:** Add Complex datatype columns is not supported.
 
 Users can specify which columns to include and exclude for local dictionary 
generation after adding new columns. These will be appended with the already 
existing local dictionary include and exclude columns of main table 
respectively.
-  ```
+     ```
      ALTER TABLE carbon ADD COLUMNS (a1 STRING, b1 STRING) 
TBLPROPERTIES('LOCAL_DICTIONARY_INCLUDE'='a1','LOCAL_DICTIONARY_EXCLUDE'='b1')
-  ```
+     ```
 
    - ##### DROP COLUMNS
    
      This command is used to delete the existing column(s) in a table.
+     
      ```
      ALTER TABLE [db_name.]table_name DROP COLUMNS (col_name, ...)
      ```
 
      Examples:
+     
      ```
      ALTER TABLE carbon DROP COLUMNS (b1)
      OR
@@ -622,12 +629,14 @@ Users can specify which columns to include and exclude 
for local dictionary gene
      
      ALTER TABLE carbon DROP COLUMNS (c1,d1)
      ```
-     NOTE: Drop Complex child column is not supported.
+ 
+     **NOTE:** Drop Complex child column is not supported.
 
    - ##### CHANGE DATA TYPE
    
      This command is used to change the data type from INT to BIGINT or 
decimal precision from lower to higher.
      Change of decimal data type from lower precision to higher precision will 
only be supported for cases where there is no data loss.
+     
      ```
      ALTER TABLE [db_name.]table_name CHANGE col_name col_name 
changed_column_type
      ```
@@ -638,25 +647,31 @@ Users can specify which columns to include and exclude 
for local dictionary gene
      - **NOTE:** The allowed range is 38,38 (precision, scale) and is a valid 
upper case scenario which is not resulting in data loss.
 
      Example1:Changing data type of column a1 from INT to BIGINT.
+     
      ```
      ALTER TABLE test_db.carbon CHANGE a1 a1 BIGINT
      ```
      
      Example2:Changing decimal precision of column a1 from 10 to 18.
+     
      ```
      ALTER TABLE test_db.carbon CHANGE a1 a1 DECIMAL(18,2)
      ```
+ 
 - ##### MERGE INDEX
 
      This command is used to merge all the CarbonData index files 
(.carbonindex) inside a segment to a single CarbonData index merge file 
(.carbonindexmerge). This enhances the first query performance.
+     
      ```
       ALTER TABLE [db_name.]table_name COMPACT 'SEGMENT_INDEX'
      ```
 
       Examples:
-      ```
+      
+     ```
       ALTER TABLE test_db.carbon COMPACT 'SEGMENT_INDEX'
       ```
+      
       **NOTE:**
 
       * Merge index is not supported on streaming table.
@@ -726,10 +741,11 @@ Users can specify which columns to include and exclude 
for local dictionary gene
   ```
   CREATE TABLE IF NOT EXISTS productSchema.productSalesTable (
                                 productNumber Int COMMENT 'unique serial 
number for product')
-  COMMENT âThis is table commentâ
+  COMMENT "This is table comment"
    STORED AS carbondata
    TBLPROPERTIES ('DICTIONARY_INCLUDE'='productNumber')
   ```
+  
   You can also SET and UNSET table comment using ALTER command.
 
   Example to SET table comment:
@@ -775,8 +791,8 @@ Users can specify which columns to include and exclude for 
local dictionary gene
   PARTITIONED BY (productCategory STRING, productBatch STRING)
   STORED AS carbondata
   ```
-   NOTE: Hive partition is not supported on complex datatype columns.
-               
+   **NOTE:** Hive partition is not supported on complex datatype columns.
+
 
 #### Show Partitions
 
@@ -832,6 +848,7 @@ Users can specify which columns to include and exclude for 
local dictionary gene
   [TBLPROPERTIES ('PARTITION_TYPE'='HASH',
                   'NUM_PARTITIONS'='N' ...)]
   ```
+  
   **NOTE:** N is the number of hash partitions
 
 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/dml-of-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/dml-of-carbondata.md b/docs/dml-of-carbondata.md
index 98bb132..db7c118 100644
--- a/docs/dml-of-carbondata.md
+++ b/docs/dml-of-carbondata.md
@@ -61,7 +61,7 @@ CarbonData DML statements are documented here,which includes:
 | [SORT_COLUMN_BOUNDS](#sort-column-bounds)               | How to parititon 
the sort columns to make the evenly distributed |
 | [SINGLE_PASS](#single_pass)                             | When to enable 
single pass data loading                      |
 | [BAD_RECORDS_LOGGER_ENABLE](#bad-records-handling)      | Whether to enable 
bad records logging                        |
-| [BAD_RECORD_PATH](#bad-records-handling)                | Bad records 
logging path.Useful when bad record logging is enabled |
+| [BAD_RECORD_PATH](#bad-records-handling)                | Bad records 
logging path. Useful when bad record logging is enabled |
 | [BAD_RECORDS_ACTION](#bad-records-handling)             | Behavior of data 
loading when bad record is found            |
 | [IS_EMPTY_DATA_BAD_RECORD](#bad-records-handling)       | Whether empty data 
of a column to be considered as bad record or not |
 | [GLOBAL_SORT_PARTITIONS](#global_sort_partitions)       | Number of 
partition to use for shuffling of data during sorting |
@@ -176,7 +176,7 @@ CarbonData DML statements are documented here,which 
includes:
 
     Range bounds for sort columns.
 
-    Suppose the table is created with 'SORT_COLUMNS'='name,id' and the range 
for name is aaa~zzz, the value range for id is 0~1000. Then during data 
loading, we can specify the following option to enhance data loading 
performance.
+    Suppose the table is created with 'SORT_COLUMNS'='name,id' and the range 
for name is aaa to zzz, the value range for id is 0 to 1000. Then during data 
loading, we can specify the following option to enhance data loading 
performance.
     ```
     OPTIONS('SORT_COLUMN_BOUNDS'='f,250;l,500;r,750')
     ```
@@ -186,7 +186,7 @@ CarbonData DML statements are documented here,which 
includes:
     * SORT_COLUMN_BOUNDS will be used only when the SORT_SCOPE is 'local_sort'.
     * Carbondata will use these bounds as ranges to process data concurrently 
during the final sort percedure. The records will be sorted and written out 
inside each partition. Since the partition is sorted, all records will be 
sorted.
     * Since the actual order and literal order of the dictionary column are 
not necessarily the same, we do not recommend you to use this feature if the 
first sort column is 'dictionary_include'.
-    * The option works better if your CPU usage during loading is low. If your 
system is already CPU tense, better not to use this option. Besides, it depends 
on the user to specify the bounds. If user does not know the exactly bounds to 
make the data distributed evenly among the bounds, loading performance will 
still be better than before or at least the same as before.
+    * The option works better if your CPU usage during loading is low. If your 
current system CPU usage is high, better not to use this option. Besides, it 
depends on the user to specify the bounds. If user does not know the exactly 
bounds to make the data distributed evenly among the bounds, loading 
performance will still be better than before or at least the same as before.
     * Users can find more information about this option in the description of 
PR1953.
 
   - ##### SINGLE_PASS:

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/documentation.md
----------------------------------------------------------------------
diff --git a/docs/documentation.md b/docs/documentation.md
index 537a9d3..1b6726a 100644
--- a/docs/documentation.md
+++ b/docs/documentation.md
@@ -25,7 +25,7 @@ Apache CarbonData is a new big data file format for faster 
interactive query usi
 
 ## Getting Started
 
-**File Format Concepts:** Start with the basics of understanding the 
[CarbonData file 
format](./file-structure-of-carbondata.md#carbondata-file-format) and its 
[storage structure](./file-structure-of-carbondata.md).This will help to 
understand other parts of the documentation, including deployment, programming 
and usage guides. 
+**File Format Concepts:** Start with the basics of understanding the 
[CarbonData file 
format](./file-structure-of-carbondata.md#carbondata-file-format) and its 
[storage structure](./file-structure-of-carbondata.md). This will help to 
understand other parts of the documentation, including deployment, programming 
and usage guides. 
 
 **Quick Start:** [Run an example 
program](./quick-start-guide.md#installing-and-configuring-carbondata-to-run-locally-with-spark-shell)
 on your local machine or [study some 
examples](https://github.com/apache/carbondata/tree/master/examples/spark2/src/main/scala/org/apache/carbondata/examples).
 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/faq.md
----------------------------------------------------------------------
diff --git a/docs/faq.md b/docs/faq.md
index 3dee5a2..3ac9a0a 100644
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -57,12 +57,12 @@ By default **carbon.badRecords.location** specifies the 
following location ``/op
 ## How to enable Bad Record Logging?
 While loading data we can specify the approach to handle Bad Records. In order 
to analyse the cause of the Bad Records the parameter 
``BAD_RECORDS_LOGGER_ENABLE`` must be set to value ``TRUE``. There are multiple 
approaches to handle Bad Records which can be specified  by the parameter 
``BAD_RECORDS_ACTION``.
 
-- To pad the incorrect values of the csv rows with NULL value and load the 
data in CarbonData, set the following in the query :
+- To pass the incorrect values of the csv rows with NULL value and load the 
data in CarbonData, set the following in the query :
 ```
 'BAD_RECORDS_ACTION'='FORCE'
 ```
 
-- To write the Bad Records without padding incorrect values with NULL in the 
raw csv (set in the parameter **carbon.badRecords.location**), set the 
following in the query :
+- To write the Bad Records without passing incorrect values with NULL in the 
raw csv (set in the parameter **carbon.badRecords.location**), set the 
following in the query :
 ```
 'BAD_RECORDS_ACTION'='REDIRECT'
 ```
@@ -199,7 +199,7 @@ select cntry,sum(gdp) from gdp21,pop1 where cntry=ctry 
group by cntry;
 ```
 
 ## Why all executors are showing success in Spark UI even after Dataload 
command failed at Driver side?
-Spark executor shows task as failed after the maximum number of retry 
attempts, but loading the data having bad records and BAD_RECORDS_ACTION 
(carbon.bad.records.action) is set as âFAILâ will attempt only once but 
will send the signal to driver as failed instead of throwing the exception to 
retry, as there is no point to retry if bad record found and BAD_RECORDS_ACTION 
is set to fail. Hence the Spark executor displays this one attempt as 
successful but the command has actually failed to execute. Task attempts or 
executor logs can be checked to observe the failure reason.
+Spark executor shows task as failed after the maximum number of retry 
attempts, but loading the data having bad records and BAD_RECORDS_ACTION 
(carbon.bad.records.action) is set as "FAIL" will attempt only once but will 
send the signal to driver as failed instead of throwing the exception to retry, 
as there is no point to retry if bad record found and BAD_RECORDS_ACTION is set 
to fail. Hence the Spark executor displays this one attempt as successful but 
the command has actually failed to execute. Task attempts or executor logs can 
be checked to observe the failure reason.
 
 ## Why different time zone result for select query output when query SDK 
writer output? 
 SDK writer is an independent entity, hence SDK writer can generate carbondata 
files from a non-cluster machine that has different time zones. But at cluster 
when those files are read, it always takes cluster time-zone. Hence, the value 
of timestamp and date datatype fields are not original value.

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/file-structure-of-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/file-structure-of-carbondata.md 
b/docs/file-structure-of-carbondata.md
index ba9004c..9e656bb 100644
--- a/docs/file-structure-of-carbondata.md
+++ b/docs/file-structure-of-carbondata.md
@@ -48,7 +48,7 @@ The CarbonData files are stored in the location specified by 
the ***carbon.store
 
 ![File Directory Structure](../docs/images/2-1_1.png?raw=true)
 
-1. ModifiedTime.mdt records the timestamp of the metadata with the 
modification time attribute of the file. When the drop table and create table 
are used, the modification time of the file is updated.This is common to all 
databases and hence is kept in parallel to databases
+1. ModifiedTime.mdt records the timestamp of the metadata with the 
modification time attribute of the file. When the drop table and create table 
are used, the modification time of the file is updated. This is common to all 
databases and hence is kept in parallel to databases
 2. The **default** is the database name and contains the user tables.default 
is used when user doesn't specify any database name;else user configured 
database name will be the directory name. user_table is the table name.
 3. Metadata directory stores schema files, tablestatus and dictionary files 
(including .dict, .dictmeta and .sortindex). There are three types of metadata 
data information files.
 4. data and index files are stored under directory named **Fact**. The Fact 
directory has a Part0 partition directory, where 0 is the partition number.

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/how-to-contribute-to-apache-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/how-to-contribute-to-apache-carbondata.md 
b/docs/how-to-contribute-to-apache-carbondata.md
index f64c948..8d6c891 100644
--- a/docs/how-to-contribute-to-apache-carbondata.md
+++ b/docs/how-to-contribute-to-apache-carbondata.md
@@ -48,7 +48,7 @@ alternatively, on the developer mailing 
list([email protected]).
 
 If thereâs an existing JIRA issue for your intended contribution, please 
comment about your
 intended work. Once the work is understood, a committer will assign the issue 
to you.
-(If you donât have a JIRA role yet, youâll be added to the 
âcontributorâ role.) If an issue is
+(If you donât have a JIRA role yet, youâll be added to the "contributor" 
role.) If an issue is
 currently assigned, please check with the current assignee before reassigning.
 
 For moderate or large contributions, you should not start coding or writing a 
design doc unless
@@ -171,7 +171,7 @@ Our GitHub mirror automatically provides pre-commit testing 
coverage using Jenki
 Please make sure those tests pass,the contribution cannot be merged otherwise.
 
 #### LGTM
-Once the reviewer is happy with the change, theyâll respond with an LGTM 
(âlooks good to me!â).
+Once the reviewer is happy with the change, theyâll respond with an LGTM 
("looks good to me!").
 At this point, the committer will take over, possibly make some additional 
touch ups,
 and merge your changes into the codebase.
 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/introduction.md
----------------------------------------------------------------------
diff --git a/docs/introduction.md b/docs/introduction.md
index 434ccfa..e6c3372 100644
--- a/docs/introduction.md
+++ b/docs/introduction.md
@@ -18,15 +18,15 @@ CarbonData has
 
 ## CarbonData Features & Functions
 
-CarbonData has rich set of featues to support various use cases in Big Data 
analytics.The below table lists the major features supported by CarbonData.
+CarbonData has rich set of features to support various use cases in Big Data 
analytics. The below table lists the major features supported by CarbonData.
 
 
 
 ### Table Management
 
 - ##### DDL (Create, Alter,Drop,CTAS)
-
-â    CarbonData provides its own DDL to create and manage carbondata 
tables.These DDL conform to                     Hive,Spark SQL format and 
support additional properties and configuration to take advantages of 
CarbonData functionalities.
+  
+  CarbonData provides its own DDL to create and manage carbondata tables. 
These DDL conform to Hive,Spark SQL format and support additional properties 
and configuration to take advantages of CarbonData functionalities.
 
 - ##### DML(Load,Insert)
 
@@ -46,7 +46,7 @@ CarbonData has rich set of featues to support various use 
cases in Big Data anal
 
 - ##### Compaction
 
-  CarbonData manages incremental loads as segments.Compaction help to compact 
the growing number of segments and also to improve query filter pruning.
+  CarbonData manages incremental loads as segments. Compaction helps to 
compact the growing number of segments and also to improve query filter pruning.
 
 - ##### External Tables
 
@@ -56,11 +56,11 @@ CarbonData has rich set of featues to support various use 
cases in Big Data anal
 
 - ##### Pre-Aggregate
 
-  CarbonData has concept of datamaps to assist in pruning of data while 
querying so that performance is faster.Pre Aggregate tables are kind of 
datamaps which can improve the query performance by order of 
magnitude.CarbonData will automatically pre-aggregae the incremental data and 
re-write the query to automatically fetch from the most appropriate 
pre-aggregate table to serve the query faster.
+  CarbonData has concept of datamaps to assist in pruning of data while 
querying so that performance is faster.Pre Aggregate tables are kind of 
datamaps which can improve the query performance by order of 
magnitude.CarbonData will automatically pre-aggregate the incremental data and 
re-write the query to automatically fetch from the most appropriate 
pre-aggregate table to serve the query faster.
 
 - ##### Time Series
 
-  CarbonData has built in understanding of time order(Year, month,day,hour, 
minute,second).Time series is a pre-aggregate table which can automatically 
roll-up the data to the desired level during incremental load and serve the 
query from the most appropriate pre-aggregate table.
+  CarbonData has built in understanding of time order(Year, month,day,hour, 
minute,second). Time series is a pre-aggregate table which can automatically 
roll-up the data to the desired level during incremental load and serve the 
query from the most appropriate pre-aggregate table.
 
 - ##### Bloom filter
 
@@ -72,7 +72,7 @@ CarbonData has rich set of featues to support various use 
cases in Big Data anal
 
 - ##### MV (Materialized Views)
 
-  MVs are kind of pre-aggregate tables which can support efficent query 
re-write and processing.CarbonData provides MV which can rewrite query to fetch 
from any table(including non-carbondata tables).Typical usecase is to store the 
aggregated data of a non-carbondata fact table into carbondata and use mv to 
rewrite the query to fetch from carbondata.
+  MVs are kind of pre-aggregate tables which can support efficent query 
re-write and processing.CarbonData provides MV which can rewrite query to fetch 
from any table(including non-carbondata tables). Typical usecase is to store 
the aggregated data of a non-carbondata fact table into carbondata and use mv 
to rewrite the query to fetch from carbondata.
 
 ### Streaming
 
@@ -84,17 +84,17 @@ CarbonData has rich set of featues to support various use 
cases in Big Data anal
 
 - ##### CarbonData writer
 
-  CarbonData supports writing data from non-spark application using SDK.Users 
can use SDK to generate carbondata files from custom applications.Typical 
usecase is to write the streaming application plugged in to kafka and use 
carbondata as sink(target) table for storing.
+  CarbonData supports writing data from non-spark application using SDK.Users 
can use SDK to generate carbondata files from custom applications. Typical 
usecase is to write the streaming application plugged in to kafka and use 
carbondata as sink(target) table for storing.
 
 - ##### CarbonData reader
 
-  CarbonData supports reading of data from non-spark application using 
SDK.Users can use the SDK to read the carbondata files from their application 
and do custom processing.
+  CarbonData supports reading of data from non-spark application using SDK. 
Users can use the SDK to read the carbondata files from their application and 
do custom processing.
 
 ### Storage
 
 - ##### S3
 
-  CarbonData can write to S3, OBS or any cloud storage confirming to S3 
protocol.CarbonData uses the HDFS api to write to cloud object stores.
+  CarbonData can write to S3, OBS or any cloud storage confirming to S3 
protocol. CarbonData uses the HDFS api to write to cloud object stores.
 
 - ##### HDFS
 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/language-manual.md
----------------------------------------------------------------------
diff --git a/docs/language-manual.md b/docs/language-manual.md
index 9d3a9b9..79aad00 100644
--- a/docs/language-manual.md
+++ b/docs/language-manual.md
@@ -34,6 +34,8 @@ CarbonData has its own parser, in addition to Spark's SQL 
Parser, to parse and p
 - Data Manipulation Statements
   - [DML:](./dml-of-carbondata.md) [Load](./dml-of-carbondata.md#load-data), 
[Insert](./dml-of-carbondata.md#insert-data-into-carbondata-table), 
[Update](./dml-of-carbondata.md#update), [Delete](./dml-of-carbondata.md#delete)
   - [Segment Management](./segment-management-on-carbondata.md)
+- [CarbonData as Spark's Datasource](./carbon-as-spark-datasource-guide.md)
 - [Configuration Properties](./configuration-parameters.md)
 
 
+

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/performance-tuning.md
----------------------------------------------------------------------
diff --git a/docs/performance-tuning.md b/docs/performance-tuning.md
index f56a63b..6c87ce9 100644
--- a/docs/performance-tuning.md
+++ b/docs/performance-tuning.md
@@ -140,7 +140,7 @@
 
 | Parameter | Default Value | Description/Tuning |
 |-----------|-------------|--------|
-|carbon.number.of.cores.while.loading|Default: 2.This value should be >= 
2|Specifies the number of cores used for data processing during data loading in 
CarbonData. |
+|carbon.number.of.cores.while.loading|Default: 2. This value should be >= 
2|Specifies the number of cores used for data processing during data loading in 
CarbonData. |
 |carbon.sort.size|Default: 100000. The value should be >= 100.|Threshold to 
write local file in sort step when loading data|
 |carbon.sort.file.write.buffer.size|Default:  50000.|DataOutputStream buffer. |
 |carbon.merge.sort.reader.thread|Default: 3 |Specifies the number of cores 
used for temp file merging during data loading in CarbonData.|
@@ -165,15 +165,15 @@
 
|----------------------------------------------|-----------------------------------|---------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | carbon.sort.intermediate.files.limit | spark/carbonlib/carbon.properties | 
Data loading | During the loading of data, local temp is used to sort the data. 
This number specifies the minimum number of intermediate files after which the  
merge sort has to be initiated. | Increasing the parameter to a higher value 
will improve the load performance. For example, when we increase the value from 
20 to 100, it increases the data load performance from 35MB/S to more than 
50MB/S. Higher values of this parameter consumes  more memory during the load. |
 | carbon.number.of.cores.while.loading | spark/carbonlib/carbon.properties | 
Data loading | Specifies the number of cores used for data processing during 
data loading in CarbonData. | If you have more number of CPUs, then you can 
increase the number of CPUs, which will increase the performance. For example 
if we increase the value from 2 to 4 then the CSV reading performance can 
increase about 1 times |
-| carbon.compaction.level.threshold | spark/carbonlib/carbon.properties | Data 
loading and Querying | For minor compaction, specifies the number of segments 
to be merged in stage 1 and number of compacted segments to be merged in stage 
2. | Each CarbonData load will create one segment, if every load is small in 
size it will generate many small file over a period of time impacting the query 
performance. Configuring this parameter will merge the small segment to one big 
segment which will sort the data and improve the performance. For Example in 
one telecommunication scenario, the performance improves about 2 times after 
minor compaction. |
+| carbon.compaction.level.threshold | spark/carbonlib/carbon.properties | Data 
loading and Querying | For minor compaction, specifies the number of segments 
to be merged in stage 1 and number of compacted segments to be merged in stage 
2. | Each CarbonData load will create one segment, if every load is small in 
size it will generate many small files over a period of time impacting the 
query performance. Configuring this parameter will merge the small segment to 
one big segment which will sort the data and improve the performance. For 
Example in one telecommunication scenario, the performance improves about 2 
times after minor compaction. |
 | spark.sql.shuffle.partitions | spark/conf/spark-defaults.conf | Querying | 
The number of task started when spark shuffle. | The value can be 1 to 2 times 
as much as the executor cores. In an aggregation scenario, reducing the number 
from 200 to 32 reduced the query time from 17 to 9 seconds. |
 | spark.executor.instances/spark.executor.cores/spark.executor.memory | 
spark/conf/spark-defaults.conf | Querying | The number of executors, CPU cores, 
and memory used for CarbonData query. | In the bank scenario, we provide the 4 
CPUs cores and 15 GB for each executor which can get good performance. This 2 
value does not mean more the better. It needs to be configured properly in case 
of limited resources. For example, In the bank scenario, it has enough CPU 32 
cores each node but less memory 64 GB each node. So we cannot give more CPU but 
less memory. For example, when 4 cores and 12GB for each executor. It sometimes 
happens GC during the query which impact the query performance very much from 
the 3 second to more than 15 seconds. In this scenario need to increase the 
memory or decrease the CPU cores. |
 | carbon.detail.batch.size | spark/carbonlib/carbon.properties | Data loading 
| The buffer size to store records, returned from the block scan. | In limit 
scenario this parameter is very important. For example your query limit is 
1000. But if we set this value to 3000 that means we get 3000 records from scan 
but spark will only take 1000 rows. So the 2000 remaining are useless. In one 
Finance test case after we set it to 100, in the limit 1000 scenario the 
performance increase about 2 times in comparison to if we set this value to 
12000. |
 | carbon.use.local.dir | spark/carbonlib/carbon.properties | Data loading | 
Whether use YARN local directories for multi-table load disk load balance | If 
this is set it to true CarbonData will use YARN local directories for 
multi-table load disk load balance, that will improve the data load 
performance. |
 | carbon.use.multiple.temp.dir | spark/carbonlib/carbon.properties | Data 
loading | Whether to use multiple YARN local directories during table data 
loading for disk load balance | After enabling 'carbon.use.local.dir', if this 
is set to true, CarbonData will use all YARN local directories during data load 
for disk load balance, that will improve the data load performance. Please 
enable this property when you encounter disk hotspot problem during data 
loading. |
-| carbon.sort.temp.compressor | spark/carbonlib/carbon.properties | Data 
loading | Specify the name of compressor to compress the intermediate sort 
temporary files during sort procedure in data loading. | The optional values 
are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD' and empty. By default, empty means 
that Carbondata will not compress the sort temp files. This parameter will be 
useful if you encounter disk bottleneck. |
-| carbon.load.skewedDataOptimization.enabled | 
spark/carbonlib/carbon.properties | Data loading | Whether to enable size based 
block allocation strategy for data loading. | When loading, carbondata will use 
file size based block allocation strategy for task distribution. It will make 
sure that all the executors process the same size of data -- It's useful if the 
size of your input data files varies widely, say 1MB~1GB. |
-| carbon.load.min.size.enabled | spark/carbonlib/carbon.properties | Data 
loading | Whether to enable node minumun input data size allocation strategy 
for data loading.| When loading, carbondata will use node minumun input data 
size allocation strategy for task distribution. It will make sure the node load 
the minimum amount of data -- It's useful if the size of your input data files 
very small, say 1MB~256MB,Avoid generating a large number of small files. |
+| carbon.sort.temp.compressor | spark/carbonlib/carbon.properties | Data 
loading | Specify the name of compressor to compress the intermediate sort 
temporary files during sort procedure in data loading. | The optional values 
are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD', and empty. By default, empty means 
that Carbondata will not compress the sort temp files. This parameter will be 
useful if you encounter disk bottleneck. |
+| carbon.load.skewedDataOptimization.enabled | 
spark/carbonlib/carbon.properties | Data loading | Whether to enable size based 
block allocation strategy for data loading. | When loading, carbondata will use 
file size based block allocation strategy for task distribution. It will make 
sure that all the executors process the same size of data -- It's useful if the 
size of your input data files varies widely, say 1MB to 1GB. |
+| carbon.load.min.size.enabled | spark/carbonlib/carbon.properties | Data 
loading | Whether to enable node minumun input data size allocation strategy 
for data loading.| When loading, carbondata will use node minumun input data 
size allocation strategy for task distribution. It will make sure the nodes 
load the minimum amount of data -- It's useful if the size of your input data 
files very small, say 1MB to 256MB,Avoid generating a large number of small 
files. |
 
   Note: If your CarbonData instance is provided only for query, you may 
specify the property 'spark.speculation=true' which is in conf directory of 
spark.
 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/quick-start-guide.md
----------------------------------------------------------------------
diff --git a/docs/quick-start-guide.md b/docs/quick-start-guide.md
index 0fdf055..fd535ae 100644
--- a/docs/quick-start-guide.md
+++ b/docs/quick-start-guide.md
@@ -16,7 +16,7 @@
 -->
 
 # Quick Start
-This tutorial provides a quick introduction to using CarbonData.To follow 
along with this guide, first download a packaged release of CarbonData from the 
[CarbonData 
website](https://dist.apache.org/repos/dist/release/carbondata/).Alternatively 
it can be created following [Building 
CarbonData](https://github.com/apache/carbondata/tree/master/build) steps.
+This tutorial provides a quick introduction to using CarbonData. To follow 
along with this guide, first download a packaged release of CarbonData from the 
[CarbonData 
website](https://dist.apache.org/repos/dist/release/carbondata/).Alternatively 
it can be created following [Building 
CarbonData](https://github.com/apache/carbondata/tree/master/build) steps.
 
 ##  Prerequisites
 * CarbonData supports Spark versions upto 2.2.1.Please download Spark package 
from [Spark website](https://spark.apache.org/downloads.html)
@@ -35,7 +35,7 @@ This tutorial provides a quick introduction to using 
CarbonData.To follow along
 
 ## Integration
 
-CarbonData can be integrated with Spark and Presto Execution Engines.The below 
documentation guides on Installing and Configuring with these execution engines.
+CarbonData can be integrated with Spark and Presto Execution Engines. The 
below documentation guides on Installing and Configuring with these execution 
engines.
 
 ### Spark
 
@@ -293,7 +293,7 @@ hdfs://<host_name>:port/user/hive/warehouse/carbon.store
 
 ## Installing and Configuring CarbonData on Presto
 
-**NOTE:** **CarbonData tables cannot be created nor loaded from Presto.User 
need to create CarbonData Table and load data into it
+**NOTE:** **CarbonData tables cannot be created nor loaded from Presto. User 
need to create CarbonData Table and load data into it
 either with 
[Spark](#installing-and-configuring-carbondata-to-run-locally-with-spark-shell) 
or [SDK](./sdk-guide.md).
 Once the table is created,it can be queried from Presto.**
 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/s3-guide.md
----------------------------------------------------------------------
diff --git a/docs/s3-guide.md b/docs/s3-guide.md
index a2e5f07..1121164 100644
--- a/docs/s3-guide.md
+++ b/docs/s3-guide.md
@@ -15,7 +15,7 @@
     limitations under the License.
 -->
 
-# S3 Guide (Alpha Feature 1.4.1)
+# S3 Guide
 
 Object storage is the recommended storage format in cloud as it can support 
storing large data 
 files. S3 APIs are widely used for accessing object stores. This can be 

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/streaming-guide.md
----------------------------------------------------------------------
diff --git a/docs/streaming-guide.md b/docs/streaming-guide.md
index 56e400e..714b07a 100644
--- a/docs/streaming-guide.md
+++ b/docs/streaming-guide.md
@@ -157,7 +157,7 @@ ALTER TABLE streaming_table SET 
TBLPROPERTIES('streaming'='true')
 At the begin of streaming ingestion, the system will try to acquire the table 
level lock of streaming.lock file. If the system isn't able to acquire the lock 
of this table, it will throw an InterruptedException.
 
 ## Create streaming segment
-The input data of streaming will be ingested into a segment of the CarbonData 
table, the status of this segment is streaming. CarbonData call it a streaming 
segment. The "tablestatus" file will record the segment status and data size. 
The user can use âSHOW SEGMENTS FOR TABLE tableNameâ to check segment 
status. 
+The streaming data will be ingested into a separate segment of carbondata 
table, this segment is termed as streaming segment. The status of this segment 
will be recorded as "streaming" in "tablestatus" file along with its data size. 
You can use "SHOW SEGMENTS FOR TABLE tableName" to check segment status. 
 
 After the streaming segment reaches the max size, CarbonData will change the 
segment status to "streaming finish" from "streaming", and create new 
"streaming" segment to continue to ingest streaming data.
 
@@ -352,7 +352,7 @@ Following example shows how to start a streaming ingest job
 
 In above example, two table is created: source and sink. The `source` table's 
format is `csv` and `sink` table format is `carbon`. Then a streaming job is 
created to stream data from source table to sink table.
 
-These two tables are normal carbon table, they can be queried independently.
+These two tables are normal carbon tables, they can be queried independently.
 
 
 
@@ -405,7 +405,7 @@ When this is issued, carbon will start a structured 
streaming job to do the stre
 
 - The sink table should have a TBLPROPERTY `'streaming'` equal to `true`, 
indicating it is a streaming table.
 - In the given STMPROPERTIES, user must specify `'trigger'`, its value must be 
`ProcessingTime` (In future, other value will be supported). User should also 
specify interval value for the streaming job.
-- If the schema specifid in sink table is different from CTAS, the streaming 
job will fail
+- If the schema specified in sink table is different from CTAS, the streaming 
job will fail
 
 For Kafka data source, create the source table by:
   ```SQL

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ca30ad97/docs/usecases.md
----------------------------------------------------------------------
diff --git a/docs/usecases.md b/docs/usecases.md
index 277c455..e8b98b5 100644
--- a/docs/usecases.md
+++ b/docs/usecases.md
@@ -39,7 +39,7 @@ These use cases can be broadly classified into below 
categories:
 
 ### Scenario
 
-User wants to analyse all the CHR(Call History Record) and MR(Measurement 
Records) of the mobile subscribers in order to identify the service failures 
within 10 secs.Also user wants to run machine learning models on the data to 
fairly estimate the reasons and time of probable failures and take action ahead 
to meet the SLA(Service Level Agreements) of VIP customers. 
+User wants to analyse all the CHR(Call History Record) and MR(Measurement 
Records) of the mobile subscribers in order to identify the service failures 
within 10 secs. Also user wants to run machine learning models on the data to 
fairly estimate the reasons and time of probable failures and take action ahead 
to meet the SLA(Service Level Agreements) of VIP customers. 
 
 ### Challenges
 
@@ -54,7 +54,7 @@ Setup a Hadoop + Spark + CarbonData cluster managed by YARN.
 
 Proposed the following configurations for CarbonData.(These tunings were 
proposed before CarbonData introduced SORT_COLUMNS parameter using which the 
sort order and schema order could be different.)
 
-Add the frequently used columns to the left of the table definition.Add it in 
the increasing order of cardinality.It was suggested to keep msisdn,imsi 
columns in the beginning of the schema.With latest CarbonData, SORT_COLUMNS 
needs to be configured msisdn,imsi in the beginning.
+Add the frequently used columns to the left of the table definition. Add it in 
the increasing order of cardinality. It was suggested to keep msisdn,imsi 
columns in the beginning of the schema. With latest CarbonData, SORT_COLUMNS 
needs to be configured msisdn,imsi in the beginning.
 
 Add timestamp column to the right of the schema as it is naturally increasing.
 
@@ -71,7 +71,7 @@ Apart from these, the following CarbonData configuration was 
suggested to be con
 | Data Loading | carbon.sort.size                        | 100000 | Number of 
records to sort at a time.More number of records configured will lead to 
increased memory foot print |
 | Data Loading | table_blocksize                         | 256  | To 
efficiently schedule multiple tasks during query |
 | Data Loading | carbon.sort.intermediate.files.limit    | 100    | Increased 
to 100 as number of cores are more.Can perform merging in backgorund.If less 
number of files to merge, sort threads would be idle |
-| Data Loading | carbon.use.local.dir                    | TRUE   | yarn 
application directory will be usually on a single disk.YARN would be configured 
with multiple disks to be used as temp or to assign randomly to 
applications.Using the yarn temp directory will allow carbon to use multiple 
disks and improve IO performance |
+| Data Loading | carbon.use.local.dir                    | TRUE   | yarn 
application directory will be usually on a single disk.YARN would be configured 
with multiple disks to be used as temp or to assign randomly to applications. 
Using the yarn temp directory will allow carbon to use multiple disks and 
improve IO performance |
 | Data Loading | carbon.use.multiple.temp.dir            | TRUE   | multiple 
disks to write sort files will lead to better IO and reduce the IO bottleneck |
 | Compaction | carbon.compaction.level.threshold       | 6,6    | Since 
frequent small loads, compacting more segments will give better query results |
 | Compaction | carbon.enable.auto.load.merge           | true   | Since data 
loading is small,auto compacting keeps the number of segments less and also 
compaction can complete in  time |
@@ -94,7 +94,7 @@ Apart from these, the following CarbonData configuration was 
suggested to be con
 
 ### Scenario
 
-User wants to analyse the person/vehicle movement and behavior during a 
certain time period.This output data needs to be joined with a external table 
for Human details extraction.The query will be run with different time period 
as filter to identify potential behavior mismatch.
+User wants to analyse the person/vehicle movement and behavior during a 
certain time period. This output data needs to be joined with a external table 
for Human details extraction. The query will be run with different time period 
as filter to identify potential behavior mismatch.
 
 ### Challenges
 
@@ -119,24 +119,24 @@ Use all columns are no-dictionary as the cardinality is 
high.
 | Configuration for | Parameter                               | Value          
         | Description |
 | ------------------| --------------------------------------- | 
----------------------- | ------------------|
 | Data Loading | carbon.graph.rowset.size                | 100000              
    | Based on the size of each row, this determines the memory required during 
data loading.Higher value leads to increased memory foot print |
-| Data Loading | enable.unsafe.sort                      | TRUE                
    | Temporary data generated during sort is huge which causes GC 
bottlenecks.Using unsafe reduces the pressure on GC |
-| Data Loading | enable.offheap.sort                     | TRUE                
    | Temporary data generated during sort is huge which causes GC 
bottlenecks.Using offheap reduces the pressure on GC.offheap can be accessed 
through java unsafe.hence enable.unsafe.sort needs to be true |
+| Data Loading | enable.unsafe.sort                      | TRUE                
    | Temporary data generated during sort is huge which causes GC bottlenecks. 
Using unsafe reduces the pressure on GC |
+| Data Loading | enable.offheap.sort                     | TRUE                
    | Temporary data generated during sort is huge which causes GC bottlenecks. 
Using offheap reduces the pressure on GC.offheap can be accessed through java 
unsafe.hence enable.unsafe.sort needs to be true |
 | Data Loading | offheap.sort.chunk.size.in.mb           | 128                 
    | Size of memory to allocate for sorting.Can increase this based on the 
memory available |
 | Data Loading | carbon.number.of.cores.while.loading    | 12                  
    | Higher cores can improve data loading speed |
 | Data Loading | carbon.sort.size                        | 100000              
    | Number of records to sort at a time.More number of records configured 
will lead to increased memory foot print |
-| Data Loading | table_blocksize                         | 512                 
    | To efficiently schedule multiple tasks during query.This size depends on 
data scenario.If data is such that the filters would select less number of 
blocklets to scan, keeping higher number works well.If the number blocklets to 
scan is more, better to reduce the size as more tasks can be scheduled in 
parallel. |
+| Data Loading | table_blocksize                         | 512                 
    | To efficiently schedule multiple tasks during query. This size depends on 
data scenario.If data is such that the filters would select less number of 
blocklets to scan, keeping higher number works well.If the number blocklets to 
scan is more, better to reduce the size as more tasks can be scheduled in 
parallel. |
 | Data Loading | carbon.sort.intermediate.files.limit    | 100                 
    | Increased to 100 as number of cores are more.Can perform merging in 
backgorund.If less number of files to merge, sort threads would be idle |
-| Data Loading | carbon.use.local.dir                    | TRUE                
    | yarn application directory will be usually on a single disk.YARN would be 
configured with multiple disks to be used as temp or to assign randomly to 
applications.Using the yarn temp directory will allow carbon to use multiple 
disks and improve IO performance |
+| Data Loading | carbon.use.local.dir                    | TRUE                
    | yarn application directory will be usually on a single disk.YARN would be 
configured with multiple disks to be used as temp or to assign randomly to 
applications. Using the yarn temp directory will allow carbon to use multiple 
disks and improve IO performance |
 | Data Loading | carbon.use.multiple.temp.dir            | TRUE                
    | multiple disks to write sort files will lead to better IO and reduce the 
IO bottleneck |
-| Data Loading | sort.inmemory.size.in.mb                | 92160 | Memory 
allocated to do inmemory sorting.When more memory is available in the node, 
configuring this will retain more sort blocks in memory so that the merge sort 
is faster due to no/very less IO |
+| Data Loading | sort.inmemory.size.in.mb                | 92160 | Memory 
allocated to do inmemory sorting. When more memory is available in the node, 
configuring this will retain more sort blocks in memory so that the merge sort 
is faster due to no/very less IO |
 | Compaction | carbon.major.compaction.size            | 921600                
  | Sum of several loads to combine into single segment |
 | Compaction | carbon.number.of.cores.while.compacting | 12                    
  | Higher number of cores can improve the compaction speed.Data size is 
huge.Compaction need to use more threads to speed up the process |
-| Compaction | carbon.enable.auto.load.merge           | FALSE                 
  | Doing auto minor compaction is costly process as data size is huge.Perform 
manual compaction when  the cluster is less loaded |
+| Compaction | carbon.enable.auto.load.merge           | FALSE                 
  | Doing auto minor compaction is costly process as data size is huge.Perform 
manual compaction when the cluster is less loaded |
 | Query | carbon.enable.vector.reader             | true                    | 
To fetch results faster, supporting spark vector processing will speed up the 
query |
-| Query | enable.unsafe.in.query.procressing      | true                    | 
Data that needs to be scanned in huge which in turn generates more short lived 
Java objects.This cause pressure of GC.using unsafe and offheap will reduce the 
GC overhead |
-| Query | use.offheap.in.query.processing         | true                    | 
Data that needs to be scanned in huge which in turn generates more short lived 
Java objects.This cause pressure of GC.using unsafe and offheap will reduce the 
GC overhead.offheap can be accessed through java unsafe.hence 
enable.unsafe.in.query.procressing needs to be true |
+| Query | enable.unsafe.in.query.procressing      | true                    | 
Data that needs to be scanned in huge which in turn generates more short lived 
Java objects. This cause pressure of GC.using unsafe and offheap will reduce 
the GC overhead |
+| Query | use.offheap.in.query.processing         | true                    | 
Data that needs to be scanned in huge which in turn generates more short lived 
Java objects. This cause pressure of GC.using unsafe and offheap will reduce 
the GC overhead.offheap can be accessed through java unsafe.hence 
enable.unsafe.in.query.procressing needs to be true |
 | Query | enable.unsafe.columnpage                | TRUE                    | 
Keep the column pages in offheap memory so that the memory overhead due to java 
object is less and also reduces GC pressure. |
-| Query | carbon.unsafe.working.memory.in.mb      | 10240                   | 
Amount of memory to use for offheap operations.Can increase this memory based 
on the data size |
+| Query | carbon.unsafe.working.memory.in.mb      | 10240                   | 
Amount of memory to use for offheap operations, you can increase this memory 
based on the data size |
 
 
 
@@ -177,7 +177,7 @@ Concurrent queries can be more due to the BI dashboard
 - Create pre-aggregate tables for non timestamp based group by queries
 - For queries containing group by date, create timeseries based 
Datamap(pre-aggregate) tables so that the data is rolled up during creation and 
fetch is faster
 - Reduce the Spark shuffle partitions.(In our configuration on 14 node 
cluster, it was reduced to 35 from default of 200)
-- Enable global dictionary for columns which have less 
cardinalities.Aggregation can be done on encoded data, there by improving the 
performance
+- Enable global dictionary for columns which have less cardinalities. 
Aggregation can be done on encoded data, there by improving the performance
 - For columns whose cardinality is high,enable the local dictionary so that 
store size is less and can take dictionary benefit for scan
 
 ## Handling near realtime data ingestion scenario
@@ -188,14 +188,14 @@ Need to support storing of continously arriving data and 
make it available immed
 
 ### Challenges
 
-When the data ingestion is near real time and the data needs to be available 
for query immediately, usual scenario is to do data loading in micro 
batches.But this causes the problem of generating many small files.This poses 
two problems:
+When the data ingestion is near real time and the data needs to be available 
for query immediately, usual scenario is to do data loading in micro 
batches.But this causes the problem of generating many small files. This poses 
two problems:
 
 1. Small file handling in HDFS is inefficient
 2. CarbonData will suffer in query performance as all the small files will 
have to be queried when filter is on non time column
 
 CarbonData will suffer in query performance as all the small files will have 
to be queried when filter is on non time column.
 
-Since data is continouly arriving, allocating resources for compaction might 
not be feasible.
+Since data is continously arriving, allocating resources for compaction might 
not be feasible.
 
 ### Goal

[40/45] carbondata git commit: [Documentation] Readme updated with latest topics and new TOC

Reply via email to