[02/20] carbondata-site git commit: modified for 1.5.0 version

chenliang613 Wed, 17 Oct 2018 03:15:02 -0700

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/datamap-developer-guide.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/datamap-developer-guide.md 
b/src/site/markdown/datamap-developer-guide.md
index 6bac9b5..60f93df 100644
--- a/src/site/markdown/datamap-developer-guide.md
+++ b/src/site/markdown/datamap-developer-guide.md
@@ -6,7 +6,7 @@ Currently, there are two 2 types of DataMap supported:
 1. IndexDataMap: DataMap that leverages index to accelerate filter query
 2. MVDataMap: DataMap that leverages Materialized View to accelerate olap 
style query, like SPJG query (select, predicate, join, groupby)
 
-### DataMap provider
+### DataMap Provider
 When user issues `CREATE DATAMAP dm ON TABLE main USING 'provider'`, the 
corresponding DataMapProvider implementation will be created and initialized. 
 Currently, the provider string can be:
 1. preaggregate: A type of MVDataMap that do pre-aggregate of single table
@@ -15,5 +15,5 @@ Currently, the provider string can be:
 
 When user issues `DROP DATAMAP dm ON TABLE main`, the corresponding 
DataMapProvider interface will be called.
 
-Details about [DataMap 
Management](./datamap/datamap-management.md#datamap-management) and supported 
[DSL](./datamap/datamap-management.md#overview) are documented 
[here](./datamap/datamap-management.md).
+Click for more details about [DataMap 
Management](./datamap/datamap-management.md#datamap-management) and supported 
[DSL](./datamap/datamap-management.md#overview).


http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/datamap-management.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/datamap-management.md 
b/src/site/markdown/datamap-management.md
index eee03a7..ad8718a 100644
--- a/src/site/markdown/datamap-management.md
+++ b/src/site/markdown/datamap-management.md
@@ -66,9 +66,9 @@ If user create MV datamap without specifying `WITH DEFERRED 
REBUILD`, carbondata
 ### Automatic Refresh
 
 When user creates a datamap on the main table without using `WITH DEFERRED 
REBUILD` syntax, the datamap will be managed by system automatically.
-For every data load to the main table, system will immediately triger a load 
to the datamap automatically. These two data loading (to main table and 
datamap) is executed in a transactional manner, meaning that it will be either 
both success or neither success. 
+For every data load to the main table, system will immediately trigger a load 
to the datamap automatically. These two data loading (to main table and 
datamap) is executed in a transactional manner, meaning that it will be either 
both success or neither success. 
 
-The data loading to datamap is incremental based on Segment concept, avoiding 
a expesive total rebuild.
+The data loading to datamap is incremental based on Segment concept, avoiding 
a expensive total rebuild.
 
 If user perform following command on the main table, system will return 
failure. (reject the operation)
 
@@ -87,7 +87,7 @@ We do recommend you to use this management for index datamap.
 
 ### Manual Refresh
 
-When user creates a datamap specifying maunal refresh semantic, the datamap is 
created with status *disabled* and query will NOT use this datamap until user 
can issue REBUILD DATAMAP command to build the datamap. For every REBUILD 
DATAMAP command, system will trigger a full rebuild of the datamap. After 
rebuild is done, system will change datamap status to *enabled*, so that it can 
be used in query rewrite.
+When user creates a datamap specifying manual refresh semantic, the datamap is 
created with status *disabled* and query will NOT use this datamap until user 
can issue REBUILD DATAMAP command to build the datamap. For every REBUILD 
DATAMAP command, system will trigger a full rebuild of the datamap. After 
rebuild is done, system will change datamap status to *enabled*, so that it can 
be used in query rewrite.
 
 For every new data loading, data update, delete, the related datamap will be 
made *disabled*,
 which means that the following queries will not benefit from the datamap 
before it becomes *enabled* again.
@@ -122,7 +122,7 @@ There is a DataMapCatalog interface to retrieve schema of 
all datamap, it can be
 
 How can user know whether datamap is used in the query?
 
-User can use EXPLAIN command to know, it will print out something like
+User can set enable.query.statistics = true and use EXPLAIN command to know, 
it will print out something like
 
 ```text
 == CarbonData Profiler ==

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/ddl-of-carbondata.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/ddl-of-carbondata.md 
b/src/site/markdown/ddl-of-carbondata.md
index acaac43..933a448 100644
--- a/src/site/markdown/ddl-of-carbondata.md
+++ b/src/site/markdown/ddl-of-carbondata.md
@@ -32,6 +32,8 @@ CarbonData DDL statements are documented here,which includes:
   * [Caching Level](#caching-at-block-or-blocklet-level)
   * [Hive/Parquet folder Structure](#support-flat-folder-same-as-hiveparquet)
   * [Extra Long String columns](#string-longer-than-32000-characters)
+  * [Compression for Table](#compression-for-table)
+  * [Bad Records Path](#bad-records-path)
 * [CREATE TABLE AS SELECT](#create-table-as-select)
 * [CREATE EXTERNAL TABLE](#create-external-table)
   * [External Table on Transactional table 
location](#create-external-table-on-managed-table-data-location)
@@ -72,6 +74,7 @@ CarbonData DDL statements are documented here,which includes:
   [TBLPROPERTIES (property_name=property_value, ...)]
   [LOCATION 'path']
   ```
+
   **NOTE:** CarbonData also supports "STORED AS carbondata" and "USING 
carbondata". Find example code at 
[CarbonSessionExample](https://github.com/apache/carbondata/blob/master/examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonSessionExample.scala)
 in the CarbonData repo.
 ### Usage Guidelines
 
@@ -84,19 +87,20 @@ CarbonData DDL statements are documented here,which 
includes:
 | [SORT_COLUMNS](#sort-columns-configuration)                  | Columns to 
include in sort and its order of sort             |
 | [SORT_SCOPE](#sort-scope-configuration)                      | Sort scope of 
the load.Options include no sort, local sort ,batch sort and global sort |
 | [TABLE_BLOCKSIZE](#table-block-size-configuration)           | Size of 
blocks to write onto hdfs                            |
+| [TABLE_BLOCKLET_SIZE](#table-blocklet-size-configuration)    | Size of 
blocklet to write in the file                        |
 | [MAJOR_COMPACTION_SIZE](#table-compaction-configuration)     | Size upto 
which the segments can be combined into one        |
 | [AUTO_LOAD_MERGE](#table-compaction-configuration)           | Whether to 
auto compact the segments                         |
 | [COMPACTION_LEVEL_THRESHOLD](#table-compaction-configuration) | Number of 
segments to compact into one segment               |
 | [COMPACTION_PRESERVE_SEGMENTS](#table-compaction-configuration) | Number of 
latest segments that needs to be excluded from compaction |
 | [ALLOWED_COMPACTION_DAYS](#table-compaction-configuration)   | Segments 
generated within the configured time limit in days will be compacted, skipping 
others |
-| [streaming](#streaming)                                      | Whether the 
table is a streaming table                       |
+| [STREAMING](#streaming)                                      | Whether the 
table is a streaming table                       |
 | [LOCAL_DICTIONARY_ENABLE](#local-dictionary-configuration)   | Enable local 
dictionary generation                           |
 | [LOCAL_DICTIONARY_THRESHOLD](#local-dictionary-configuration) | Cardinality 
upto which the local dictionary can be generated |
-| [LOCAL_DICTIONARY_INCLUDE](#local-dictionary-configuration)  | Columns for 
which local dictionary needs to be generated.Useful when local dictionary need 
not be generated for all string/varchar/char columns |
-| [LOCAL_DICTIONARY_EXCLUDE](#local-dictionary-configuration)  | Columns for 
which local dictionary generation should be skipped.Useful when local 
dictionary need not be generated for few string/varchar/char columns |
+| [LOCAL_DICTIONARY_INCLUDE](#local-dictionary-configuration)  | Columns for 
which local dictionary needs to be generated. Useful when local dictionary need 
not be generated for all string/varchar/char columns |
+| [LOCAL_DICTIONARY_EXCLUDE](#local-dictionary-configuration)  | Columns for 
which local dictionary generation should be skipped. Useful when local 
dictionary need not be generated for few string/varchar/char columns |
 | [COLUMN_META_CACHE](#caching-minmax-value-for-required-columns) | Columns 
whose metadata can be cached in Driver for efficient pruning and improved query 
performance |
-| [CACHE_LEVEL](#caching-at-block-or-blocklet-level)           | Column 
metadata caching level.Whether to cache column metadata of block or blocklet |
-| [flat_folder](#support-flat-folder-same-as-hiveparquet)      | Whether to 
write all the carbondata files in a single folder.Not writing segments folder 
during incremental load |
+| [CACHE_LEVEL](#caching-at-block-or-blocklet-level)           | Column 
metadata caching level. Whether to cache column metadata of block or blocklet |
+| [FLAT_FOLDER](#support-flat-folder-same-as-hiveparquet)      | Whether to 
write all the carbondata files in a single folder.Not writing segments folder 
during incremental load |
 | [LONG_STRING_COLUMNS](#string-longer-than-32000-characters)  | Columns which 
are greater than 32K characters                |
 | [BUCKETNUMBER](#bucketing)                                   | Number of 
buckets to be created                              |
 | [BUCKETCOLUMNS](#bucketing)                                  | Columns which 
are to be placed in buckets                    |
@@ -110,9 +114,10 @@ CarbonData DDL statements are documented here,which 
includes:
 
      ```
      TBLPROPERTIES ('DICTIONARY_INCLUDE'='column1, column2')
-       ```
-        NOTE: Dictionary Include/Exclude for complex child columns is not 
supported.
-       
+     ```
+
+     **NOTE**: Dictionary Include/Exclude for complex child columns is not 
supported.
+
    - ##### Inverted Index Configuration
 
      By default inverted index is enabled, it might help to improve 
compression ratio and query speed, especially for low cardinality columns which 
are in reward position.
@@ -127,7 +132,7 @@ CarbonData DDL statements are documented here,which 
includes:
      This property is for users to specify which columns belong to the 
MDK(Multi-Dimensions-Key) index.
      * If users don't specify "SORT_COLUMN" property, by default MDK index be 
built by using all dimension columns except complex data type column. 
      * If this property is specified but with empty argument, then the table 
will be loaded without sort.
-        * This supports only string, date, timestamp, short, int, long, and 
boolean data types.
+     * This supports only string, date, timestamp, short, int, long, byte and 
boolean data types.
      Suggested use cases : Only build MDK index for required columns,it might 
help to improve the data loading performance.
 
      ```
@@ -135,7 +140,8 @@ CarbonData DDL statements are documented here,which 
includes:
      OR
      TBLPROPERTIES ('SORT_COLUMNS'='')
      ```
-     NOTE: Sort_Columns for Complex datatype columns is not supported.
+
+     **NOTE**: Sort_Columns for Complex datatype columns is not supported.
 
    - ##### Sort Scope Configuration
    
@@ -146,10 +152,10 @@ CarbonData DDL statements are documented here,which 
includes:
      * BATCH_SORT: It increases the load performance but decreases the query 
performance if identified blocks > parallelism.
      * GLOBAL_SORT: It increases the query performance, especially high 
concurrent point query.
        And if you care about loading resources isolation strictly, because the 
system uses the spark GroupBy to sort data, the resource can be controlled by 
spark. 
-       
-       ### Example:
 
-   ```
+    ### Example:
+
+    ```
     CREATE TABLE IF NOT EXISTS productSchema.productSalesTable (
       productNumber INT,
       productName STRING,
@@ -162,19 +168,30 @@ CarbonData DDL statements are documented here,which 
includes:
     STORED AS carbondata
     TBLPROPERTIES ('SORT_COLUMNS'='productName,storeCity',
                    'SORT_SCOPE'='NO_SORT')
-   ```
+    ```
 
    **NOTE:** CarbonData also supports "using carbondata". Find example code at 
[SparkSessionExample](https://github.com/apache/carbondata/blob/master/examples/spark2/src/main/scala/org/apache/carbondata/examples/SparkSessionExample.scala)
 in the CarbonData repo.
 
    - ##### Table Block Size Configuration
 
-     This command is for setting block size of this table, the default value 
is 1024 MB and supports a range of 1 MB to 2048 MB.
+     This property is for setting block size of this table, the default value 
is 1024 MB and supports a range of 1 MB to 2048 MB.
 
      ```
      TBLPROPERTIES ('TABLE_BLOCKSIZE'='512')
      ```
+
      **NOTE:** 512 or 512M both are accepted.
 
+   - ##### Table Blocklet Size Configuration
+
+     This property is for setting blocklet size in the carbondata file, the 
default value is 64 MB.
+     Blocklet is the minimum IO read unit, in case of point queries reduce 
blocklet size might improve the query performance.
+
+     Example usage:
+     ```
+     TBLPROPERTIES ('TABLE_BLOCKLET_SIZE'='8')
+     ```
+
    - ##### Table Compaction Configuration
    
      These properties are table level compaction configurations, if not 
specified, system level configurations in carbon.properties will be used.
@@ -196,7 +213,7 @@ CarbonData DDL statements are documented here,which 
includes:
      
    - ##### Streaming
 
-     CarbonData supports streaming ingestion for real-time data. You can 
create the âstreamingâ table using the following table properties.
+     CarbonData supports streaming ingestion for real-time data. You can 
create the 'streaming' table using the following table properties.
 
      ```
      TBLPROPERTIES ('streaming'='true')
@@ -231,7 +248,13 @@ CarbonData DDL statements are documented here,which 
includes:
    
    * In case of multi-level complex dataType columns, primitive 
string/varchar/char columns are considered for local dictionary generation.
 
-   Local dictionary will have to be enabled explicitly during create table or 
by enabling the **system property** ***carbon.local.dictionary.enable***. By 
default, Local Dictionary will be disabled for the carbondata table.
+   System Level Properties for Local Dictionary: 
+   
+   
+   | Properties | Default value | Description |
+   | ---------- | ------------- | ----------- |
+   | carbon.local.dictionary.enable | false | By default, Local Dictionary 
will be disabled for the carbondata table. |
+   | carbon.local.dictionary.decoder.fallback | true | Page Level data will 
not be maintained for the blocklet. During fallback, actual data will be 
retrieved from the encoded page data using local dictionary. **NOTE:** Memory 
footprint decreases significantly as compared to when this property is set to 
false |
     
    Local Dictionary can be configured using the following properties during 
create table command: 
           
@@ -239,24 +262,24 @@ CarbonData DDL statements are documented here,which 
includes:
 | Properties | Default value | Description |
 | ---------- | ------------- | ----------- |
 | LOCAL_DICTIONARY_ENABLE | false | Whether to enable local dictionary 
generation. **NOTE:** If this property is defined, it will override the value 
configured at system level by '***carbon.local.dictionary.enable***'.Local 
dictionary will be generated for all string/varchar/char columns unless 
LOCAL_DICTIONARY_INCLUDE, LOCAL_DICTIONARY_EXCLUDE is configured. |
-| LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality of a column 
upto which carbondata can try to generate local dictionary (maximum - 100000) |
-| LOCAL_DICTIONARY_INCLUDE | string/varchar/char columns| Columns for which 
Local Dictionary has to be generated.**NOTE:** Those string/varchar/char 
columns which are added into DICTIONARY_INCLUDE option will not be considered 
for local dictionary generation.This property needs to be configured only when 
local dictionary needs to be generated for few columns, skipping others.This 
property takes effect only when **LOCAL_DICTIONARY_ENABLE** is true or 
**carbon.local.dictionary.enable** is true |
-| LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary need 
not be generated.This property needs to be configured only when local 
dictionary needs to be skipped for few columns, generating for others.This 
property takes effect only when **LOCAL_DICTIONARY_ENABLE** is true or 
**carbon.local.dictionary.enable** is true |
+| LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality of a column 
upto which carbondata can try to generate local dictionary (maximum - 100000). 
**NOTE:** When LOCAL_DICTIONARY_THRESHOLD is defined for Complex columns, the 
count of distinct records of all child columns are summed up. |
+| LOCAL_DICTIONARY_INCLUDE | string/varchar/char columns| Columns for which 
Local Dictionary has to be generated.**NOTE:** Those string/varchar/char 
columns which are added into DICTIONARY_INCLUDE option will not be considered 
for local dictionary generation. This property needs to be configured only when 
local dictionary needs to be generated for few columns, skipping others. This 
property takes effect only when **LOCAL_DICTIONARY_ENABLE** is true or 
**carbon.local.dictionary.enable** is true |
+| LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary need 
not be generated. This property needs to be configured only when local 
dictionary needs to be skipped for few columns, generating for others. This 
property takes effect only when **LOCAL_DICTIONARY_ENABLE** is true or 
**carbon.local.dictionary.enable** is true |
 
    **Fallback behavior:** 
 
    * When the cardinality of a column exceeds the threshold, it triggers a 
fallback and the generated dictionary will be reverted and data loading will be 
continued without dictionary encoding.
+   
+   * In case of complex columns, fallback is triggered when the summation 
value of all child columns' distinct records exceeds the defined 
LOCAL_DICTIONARY_THRESHOLD value.
 
    **NOTE:** When fallback is triggered, the data loading performance will 
decrease as encoded data will be discarded and the actual data is written to 
the temporary sort files.
 
    **Points to be noted:**
 
-   1. Reduce Block size:
+   * Reduce Block size:
    
       Number of Blocks generated is less in case of Local Dictionary as 
compression ratio is high. This may reduce the number of tasks launched during 
query, resulting in degradation of query performance if the pruned blocks are 
less compared to the number of parallel tasks which can be run. So it is 
recommended to configure smaller block size which in turn generates more number 
of blocks.
       
-   2. All the page-level data for a blocklet needs to be maintained in memory 
until all the pages encoded for local dictionary is processed in order to 
handle fallback. Hence the memory required for local dictionary based table is 
more and this memory increase is proportional to number of columns. 
-      
 ### Example:
 
    ```
@@ -287,19 +310,19 @@ CarbonData DDL statements are documented here,which 
includes:
       * If you want no column min/max values to be cached in the driver.
 
       ```
-      COLUMN_META_CACHE=ââ
+      COLUMN_META_CACHE=''
       ```
 
       * If you want only col1 min/max values to be cached in the driver.
 
       ```
-      COLUMN_META_CACHE=âcol1â
+      COLUMN_META_CACHE='col1'
       ```
 
       * If you want min/max values to be cached in driver for all the 
specified columns.
 
       ```
-      COLUMN_META_CACHE=âcol1,col2,col3,â¦â
+      COLUMN_META_CACHE='col1,col2,col3,â¦'
       ```
 
       Columns to be cached can be specified either while creating table or 
after creation of the table.
@@ -308,13 +331,13 @@ CarbonData DDL statements are documented here,which 
includes:
       Syntax:
 
       ```
-      CREATE TABLE [dbName].tableName (col1 String, col2 String, col3 int,â¦) 
STORED BY âcarbondataâ TBLPROPERTIES 
(âCOLUMN_META_CACHEâ=âcol1,col2,â¦â)
+      CREATE TABLE [dbName].tableName (col1 String, col2 String, col3 int,â¦) 
STORED BY 'carbondata' TBLPROPERTIES ('COLUMN_META_CACHE'='col1,col2,â¦')
       ```
 
       Example:
 
       ```
-      CREATE TABLE employee (name String, city String, id int) STORED BY 
âcarbondataâ TBLPROPERTIES (âCOLUMN_META_CACHEâ=ânameâ)
+      CREATE TABLE employee (name String, city String, id int) STORED BY 
'carbondata' TBLPROPERTIES ('COLUMN_META_CACHE'='name')
       ```
 
       After creation of table or on already created tables use the alter table 
command to configure the columns to be cached.
@@ -322,13 +345,13 @@ CarbonData DDL statements are documented here,which 
includes:
       Syntax:
 
       ```
-      ALTER TABLE [dbName].tableName SET TBLPROPERTIES 
(âCOLUMN_META_CACHEâ=âcol1,col2,â¦â)
+      ALTER TABLE [dbName].tableName SET TBLPROPERTIES 
('COLUMN_META_CACHE'='col1,col2,â¦')
       ```
 
       Example:
 
       ```
-      ALTER TABLE employee SET TBLPROPERTIES 
(âCOLUMN_META_CACHEâ=âcityâ)
+      ALTER TABLE employee SET TBLPROPERTIES ('COLUMN_META_CACHE'='city')
       ```
 
    - ##### Caching at Block or Blocklet Level
@@ -340,13 +363,13 @@ CarbonData DDL statements are documented here,which 
includes:
       *Configuration for caching in driver at Block level (default value).*
 
       ```
-      CACHE_LEVEL= âBLOCKâ
+      CACHE_LEVEL= 'BLOCK'
       ```
 
       *Configuration for caching in driver at Blocklet level.*
 
       ```
-      CACHE_LEVEL= âBLOCKLETâ
+      CACHE_LEVEL= 'BLOCKLET'
       ```
 
       Cache level can be specified either while creating table or after 
creation of the table.
@@ -355,13 +378,13 @@ CarbonData DDL statements are documented here,which 
includes:
       Syntax:
 
       ```
-      CREATE TABLE [dbName].tableName (col1 String, col2 String, col3 int,â¦) 
STORED BY âcarbondataâ TBLPROPERTIES (âCACHE_LEVELâ=âBlockletâ)
+      CREATE TABLE [dbName].tableName (col1 String, col2 String, col3 int,â¦) 
STORED BY 'carbondata' TBLPROPERTIES ('CACHE_LEVEL'='Blocklet')
       ```
 
       Example:
 
       ```
-      CREATE TABLE employee (name String, city String, id int) STORED BY 
âcarbondataâ TBLPROPERTIES (âCACHE_LEVELâ=âBlockletâ)
+      CREATE TABLE employee (name String, city String, id int) STORED BY 
'carbondata' TBLPROPERTIES ('CACHE_LEVEL'='Blocklet')
       ```
 
       After creation of table or on already created tables use the alter table 
command to configure the cache level.
@@ -369,26 +392,27 @@ CarbonData DDL statements are documented here,which 
includes:
       Syntax:
 
       ```
-      ALTER TABLE [dbName].tableName SET TBLPROPERTIES 
(âCACHE_LEVELâ=âBlockletâ)
+      ALTER TABLE [dbName].tableName SET TBLPROPERTIES 
('CACHE_LEVEL'='Blocklet')
       ```
 
       Example:
 
       ```
-      ALTER TABLE employee SET TBLPROPERTIES (âCACHE_LEVELâ=âBlockletâ)
+      ALTER TABLE employee SET TBLPROPERTIES ('CACHE_LEVEL'='Blocklet')
       ```
 
    - ##### Support Flat folder same as Hive/Parquet
 
-       This feature allows all carbondata and index files to keep directy 
under tablepath. Currently all carbondata/carbonindex files written under 
tablepath/Fact/Part0/Segment_NUM folder and it is not same as hive/parquet 
folder structure. This feature makes all files written will be directly under 
tablepath, it does not maintain any segment folder structure.This is useful for 
interoperability between the execution engines and plugin with other execution 
engines like hive or presto becomes easier.
+       This feature allows all carbondata and index files to keep directy 
under tablepath. Currently all carbondata/carbonindex files written under 
tablepath/Fact/Part0/Segment_NUM folder and it is not same as hive/parquet 
folder structure. This feature makes all files written will be directly under 
tablepath, it does not maintain any segment folder structure. This is useful 
for interoperability between the execution engines and plugin with other 
execution engines like hive or presto becomes easier.
 
        Following table property enables this feature and default value is 
false.
        ```
         'flat_folder'='true'
        ```
+
        Example:
        ```
-       CREATE TABLE employee (name String, city String, id int) STORED BY 
âcarbondataâ TBLPROPERTIES ('flat_folder'='true')
+       CREATE TABLE employee (name String, city String, id int) STORED BY 
'carbondata' TBLPROPERTIES ('flat_folder'='true')
        ```
 
    - ##### String longer than 32000 characters
@@ -418,6 +442,41 @@ CarbonData DDL statements are documented here,which 
includes:
 
      **NOTE:** The LONG_STRING_COLUMNS can only be string/char/varchar columns 
and cannot be dictionary_include/sort_columns/complex columns.
 
+   - ##### Compression for table
+
+     Data compression is also supported by CarbonData.
+     By default, Snappy is used to compress the data. CarbonData also support 
ZSTD compressor.
+     User can specify the compressor in the table property:
+
+     ```
+     TBLPROPERTIES('carbon.column.compressor'='snappy')
+     ```
+     or
+     ```
+     TBLPROPERTIES('carbon.column.compressor'='zstd')
+     ```
+     If the compressor is configured, all the data loading and compaction will 
use that compressor.
+     If the compressor is not configured, the data loading and compaction will 
use the compressor from current system property.
+     In this scenario, the compressor for each load may differ if the system 
property is changed each time. This is helpful if you want to change the 
compressor for a table.
+     The corresponding system property is configured in carbon.properties file 
as below:
+     ```
+     carbon.column.compressor=snappy
+     ```
+     or
+     ```
+     carbon.column.compressor=zstd
+     ```
+
+   - ##### Bad Records Path
+     This property is used to specify the location where bad records would be 
written.
+     As the table path remains the same after rename therefore the user can 
use this property to
+     specify bad records path for the table at the time of creation, so that 
the same path can
+     be later viewed in table description for reference.
+
+     ```
+       TBLPROPERTIES('BAD_RECORD_PATH'='/opt/badrecords'')
+     ```
+
 ## CREATE TABLE AS SELECT
   This function allows user to create a Carbon table from any of the 
Parquet/Hive/Carbon table. This is beneficial when the user wants to create 
Carbon table from any other Parquet/Hive table and use the Carbon query engine 
to query and achieve better query results for cases where Carbon is faster than 
other file formats. Also this feature can be used for backing up the data.
 
@@ -457,7 +516,7 @@ CarbonData DDL statements are documented here,which 
includes:
   This function allows user to create external table by specifying location.
   ```
   CREATE EXTERNAL TABLE [IF NOT EXISTS] [db_name.]table_name 
-  STORED AS carbondata LOCATION â$FilesPathâ
+  STORED AS carbondata LOCATION '$FilesPath'
   ```
 
 ### Create external table on managed table data location.
@@ -510,7 +569,7 @@ CarbonData DDL statements are documented here,which 
includes:
 
 ### Example
   ```
-  CREATE DATABASE carbon LOCATION âhdfs://name_cluster/dir1/carbonstoreâ;
+  CREATE DATABASE carbon LOCATION "hdfs://name_cluster/dir1/carbonstore";
   ```
 
 ## TABLE MANAGEMENT  
@@ -568,21 +627,23 @@ CarbonData DDL statements are documented here,which 
includes:
      ```
      ALTER TABLE carbon ADD COLUMNS (a1 INT, b1 STRING) 
TBLPROPERTIES('DEFAULT.VALUE.a1'='10')
      ```
-      NOTE: Add Complex datatype columns is not supported.
+      **NOTE:** Add Complex datatype columns is not supported.
 
 Users can specify which columns to include and exclude for local dictionary 
generation after adding new columns. These will be appended with the already 
existing local dictionary include and exclude columns of main table 
respectively.
-  ```
+     ```
      ALTER TABLE carbon ADD COLUMNS (a1 STRING, b1 STRING) 
TBLPROPERTIES('LOCAL_DICTIONARY_INCLUDE'='a1','LOCAL_DICTIONARY_EXCLUDE'='b1')
-  ```
+     ```
 
    - ##### DROP COLUMNS
    
      This command is used to delete the existing column(s) in a table.
+
      ```
      ALTER TABLE [db_name.]table_name DROP COLUMNS (col_name, ...)
      ```
 
      Examples:
+
      ```
      ALTER TABLE carbon DROP COLUMNS (b1)
      OR
@@ -590,12 +651,14 @@ Users can specify which columns to include and exclude 
for local dictionary gene
      
      ALTER TABLE carbon DROP COLUMNS (c1,d1)
      ```
-     NOTE: Drop Complex child column is not supported.
+
+     **NOTE:** Drop Complex child column is not supported.
 
    - ##### CHANGE DATA TYPE
    
      This command is used to change the data type from INT to BIGINT or 
decimal precision from lower to higher.
      Change of decimal data type from lower precision to higher precision will 
only be supported for cases where there is no data loss.
+
      ```
      ALTER TABLE [db_name.]table_name CHANGE col_name col_name 
changed_column_type
      ```
@@ -606,25 +669,31 @@ Users can specify which columns to include and exclude 
for local dictionary gene
      - **NOTE:** The allowed range is 38,38 (precision, scale) and is a valid 
upper case scenario which is not resulting in data loss.
 
      Example1:Changing data type of column a1 from INT to BIGINT.
+
      ```
      ALTER TABLE test_db.carbon CHANGE a1 a1 BIGINT
      ```
      
      Example2:Changing decimal precision of column a1 from 10 to 18.
+
      ```
      ALTER TABLE test_db.carbon CHANGE a1 a1 DECIMAL(18,2)
      ```
+
 - ##### MERGE INDEX
 
      This command is used to merge all the CarbonData index files 
(.carbonindex) inside a segment to a single CarbonData index merge file 
(.carbonindexmerge). This enhances the first query performance.
+
      ```
       ALTER TABLE [db_name.]table_name COMPACT 'SEGMENT_INDEX'
      ```
 
       Examples:
-      ```
+
+     ```
       ALTER TABLE test_db.carbon COMPACT 'SEGMENT_INDEX'
       ```
+
       **NOTE:**
 
       * Merge index is not supported on streaming table.
@@ -694,10 +763,11 @@ Users can specify which columns to include and exclude 
for local dictionary gene
   ```
   CREATE TABLE IF NOT EXISTS productSchema.productSalesTable (
                                 productNumber Int COMMENT 'unique serial 
number for product')
-  COMMENT âThis is table commentâ
+  COMMENT "This is table comment"
    STORED AS carbondata
    TBLPROPERTIES ('DICTIONARY_INCLUDE'='productNumber')
   ```
+
   You can also SET and UNSET table comment using ALTER command.
 
   Example to SET table comment:
@@ -743,8 +813,8 @@ Users can specify which columns to include and exclude for 
local dictionary gene
   PARTITIONED BY (productCategory STRING, productBatch STRING)
   STORED AS carbondata
   ```
-   NOTE: Hive partition is not supported on complex datatype columns.
-               
+   **NOTE:** Hive partition is not supported on complex datatype columns.
+
 
 #### Show Partitions
 
@@ -800,6 +870,7 @@ Users can specify which columns to include and exclude for 
local dictionary gene
   [TBLPROPERTIES ('PARTITION_TYPE'='HASH',
                   'NUM_PARTITIONS'='N' ...)]
   ```
+
   **NOTE:** N is the number of hash partitions
 
 

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/dml-of-carbondata.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/dml-of-carbondata.md 
b/src/site/markdown/dml-of-carbondata.md
index 42da655..393ebd3 100644
--- a/src/site/markdown/dml-of-carbondata.md
+++ b/src/site/markdown/dml-of-carbondata.md
@@ -46,7 +46,7 @@ CarbonData DML statements are documented here,which includes:
 | ------------------------------------------------------- | 
------------------------------------------------------------ |
 | [DELIMITER](#delimiter)                                 | Character used to 
separate the data in the input csv file    |
 | [QUOTECHAR](#quotechar)                                 | Character used to 
quote the data in the input csv file       |
-| [COMMENTCHAR](#commentchar)                             | Character used to 
comment the rows in the input csv file.Those rows will be skipped from 
processing |
+| [COMMENTCHAR](#commentchar)                             | Character used to 
comment the rows in the input csv file. Those rows will be skipped from 
processing |
 | [HEADER](#header)                                       | Whether the input 
csv files have header row                  |
 | [FILEHEADER](#fileheader)                               | If header is not 
present in the input csv, what is the column names to be used for data read 
from input csv |
 | [MULTILINE](#multiline)                                 | Whether a row data 
can span across multiple lines.           |
@@ -56,12 +56,12 @@ CarbonData DML statements are documented here,which 
includes:
 | [COMPLEX_DELIMITER_LEVEL_2](#complex_delimiter_level_2) | Ending delimiter 
for complex type data in input csv file     |
 | [ALL_DICTIONARY_PATH](#all_dictionary_path)             | Path to read the 
dictionary data from all columns            |
 | [COLUMNDICT](#columndict)                               | Path to read the 
dictionary data from for particular column  |
-| [DATEFORMAT](#dateformat)                               | Format of date in 
the input csv file                         |
-| [TIMESTAMPFORMAT](#timestampformat)                     | Format of 
timestamp in the input csv file                    |
+| [DATEFORMAT](#dateformattimestampformat)                | Format of date in 
the input csv file                         |
+| [TIMESTAMPFORMAT](#dateformattimestampformat)           | Format of 
timestamp in the input csv file                    |
 | [SORT_COLUMN_BOUNDS](#sort-column-bounds)               | How to parititon 
the sort columns to make the evenly distributed |
 | [SINGLE_PASS](#single_pass)                             | When to enable 
single pass data loading                      |
 | [BAD_RECORDS_LOGGER_ENABLE](#bad-records-handling)      | Whether to enable 
bad records logging                        |
-| [BAD_RECORD_PATH](#bad-records-handling)                | Bad records 
logging path.Useful when bad record logging is enabled |
+| [BAD_RECORD_PATH](#bad-records-handling)                | Bad records 
logging path. Useful when bad record logging is enabled |
 | [BAD_RECORDS_ACTION](#bad-records-handling)             | Behavior of data 
loading when bad record is found            |
 | [IS_EMPTY_DATA_BAD_RECORD](#bad-records-handling)       | Whether empty data 
of a column to be considered as bad record or not |
 | [GLOBAL_SORT_PARTITIONS](#global_sort_partitions)       | Number of 
partition to use for shuffling of data during sorting |
@@ -176,7 +176,7 @@ CarbonData DML statements are documented here,which 
includes:
 
     Range bounds for sort columns.
 
-    Suppose the table is created with 'SORT_COLUMNS'='name,id' and the range 
for name is aaa~zzz, the value range for id is 0~1000. Then during data 
loading, we can specify the following option to enhance data loading 
performance.
+    Suppose the table is created with 'SORT_COLUMNS'='name,id' and the range 
for name is aaa to zzz, the value range for id is 0 to 1000. Then during data 
loading, we can specify the following option to enhance data loading 
performance.
     ```
     OPTIONS('SORT_COLUMN_BOUNDS'='f,250;l,500;r,750')
     ```
@@ -186,7 +186,7 @@ CarbonData DML statements are documented here,which 
includes:
     * SORT_COLUMN_BOUNDS will be used only when the SORT_SCOPE is 'local_sort'.
     * Carbondata will use these bounds as ranges to process data concurrently 
during the final sort percedure. The records will be sorted and written out 
inside each partition. Since the partition is sorted, all records will be 
sorted.
     * Since the actual order and literal order of the dictionary column are 
not necessarily the same, we do not recommend you to use this feature if the 
first sort column is 'dictionary_include'.
-    * The option works better if your CPU usage during loading is low. If your 
system is already CPU tense, better not to use this option. Besides, it depends 
on the user to specify the bounds. If user does not know the exactly bounds to 
make the data distributed evenly among the bounds, loading performance will 
still be better than before or at least the same as before.
+    * The option works better if your CPU usage during loading is low. If your 
current system CPU usage is high, better not to use this option. Besides, it 
depends on the user to specify the bounds. If user does not know the exactly 
bounds to make the data distributed evenly among the bounds, loading 
performance will still be better than before or at least the same as before.
     * Users can find more information about this option in the description of 
PR1953.
 
   - ##### SINGLE_PASS:
@@ -240,14 +240,6 @@ CarbonData DML statements are documented here,which 
includes:
   * Since Bad Records Path can be specified in create, load and carbon 
properties. 
     Therefore, value specified in load will have the highest priority, and 
value specified in carbon properties will have the least priority.
 
-   **Bad Records Path:**
-         This property is used to specify the location where bad records would 
be written.
-        
-
-   ```
-   TBLPROPERTIES('BAD_RECORDS_PATH'='/opt/badrecords'')
-   ```
-
   Example:
 
   ```

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/documentation.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/documentation.md 
b/src/site/markdown/documentation.md
index 537a9d3..1b6726a 100644
--- a/src/site/markdown/documentation.md
+++ b/src/site/markdown/documentation.md
@@ -25,7 +25,7 @@ Apache CarbonData is a new big data file format for faster 
interactive query usi
 
 ## Getting Started
 
-**File Format Concepts:** Start with the basics of understanding the 
[CarbonData file 
format](./file-structure-of-carbondata.md#carbondata-file-format) and its 
[storage structure](./file-structure-of-carbondata.md).This will help to 
understand other parts of the documentation, including deployment, programming 
and usage guides. 
+**File Format Concepts:** Start with the basics of understanding the 
[CarbonData file 
format](./file-structure-of-carbondata.md#carbondata-file-format) and its 
[storage structure](./file-structure-of-carbondata.md). This will help to 
understand other parts of the documentation, including deployment, programming 
and usage guides. 
 
 **Quick Start:** [Run an example 
program](./quick-start-guide.md#installing-and-configuring-carbondata-to-run-locally-with-spark-shell)
 on your local machine or [study some 
examples](https://github.com/apache/carbondata/tree/master/examples/spark2/src/main/scala/org/apache/carbondata/examples).
 

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/faq.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/faq.md b/src/site/markdown/faq.md
index 8ec7290..3ac9a0a 100644
--- a/src/site/markdown/faq.md
+++ b/src/site/markdown/faq.md
@@ -28,6 +28,7 @@
 * [Why aggregate query is not fetching data from aggregate 
table?](#why-aggregate-query-is-not-fetching-data-from-aggregate-table)
 * [Why all executors are showing success in Spark UI even after Dataload 
command failed at Driver 
side?](#why-all-executors-are-showing-success-in-spark-ui-even-after-dataload-command-failed-at-driver-side)
 * [Why different time zone result for select query output when query SDK 
writer 
output?](#why-different-time-zone-result-for-select-query-output-when-query-sdk-writer-output)
+* [How to check LRU cache memory 
footprint?](#how-to-check-lru-cache-memory-footprint)
 
 # TroubleShooting
 
@@ -56,12 +57,12 @@ By default **carbon.badRecords.location** specifies the 
following location ``/op
 ## How to enable Bad Record Logging?
 While loading data we can specify the approach to handle Bad Records. In order 
to analyse the cause of the Bad Records the parameter 
``BAD_RECORDS_LOGGER_ENABLE`` must be set to value ``TRUE``. There are multiple 
approaches to handle Bad Records which can be specified  by the parameter 
``BAD_RECORDS_ACTION``.
 
-- To pad the incorrect values of the csv rows with NULL value and load the 
data in CarbonData, set the following in the query :
+- To pass the incorrect values of the csv rows with NULL value and load the 
data in CarbonData, set the following in the query :
 ```
 'BAD_RECORDS_ACTION'='FORCE'
 ```
 
-- To write the Bad Records without padding incorrect values with NULL in the 
raw csv (set in the parameter **carbon.badRecords.location**), set the 
following in the query :
+- To write the Bad Records without passing incorrect values with NULL in the 
raw csv (set in the parameter **carbon.badRecords.location**), set the 
following in the query :
 ```
 'BAD_RECORDS_ACTION'='REDIRECT'
 ```
@@ -198,7 +199,7 @@ select cntry,sum(gdp) from gdp21,pop1 where cntry=ctry 
group by cntry;
 ```
 
 ## Why all executors are showing success in Spark UI even after Dataload 
command failed at Driver side?
-Spark executor shows task as failed after the maximum number of retry 
attempts, but loading the data having bad records and BAD_RECORDS_ACTION 
(carbon.bad.records.action) is set as âFAILâ will attempt only once but 
will send the signal to driver as failed instead of throwing the exception to 
retry, as there is no point to retry if bad record found and BAD_RECORDS_ACTION 
is set to fail. Hence the Spark executor displays this one attempt as 
successful but the command has actually failed to execute. Task attempts or 
executor logs can be checked to observe the failure reason.
+Spark executor shows task as failed after the maximum number of retry 
attempts, but loading the data having bad records and BAD_RECORDS_ACTION 
(carbon.bad.records.action) is set as "FAIL" will attempt only once but will 
send the signal to driver as failed instead of throwing the exception to retry, 
as there is no point to retry if bad record found and BAD_RECORDS_ACTION is set 
to fail. Hence the Spark executor displays this one attempt as successful but 
the command has actually failed to execute. Task attempts or executor logs can 
be checked to observe the failure reason.
 
 ## Why different time zone result for select query output when query SDK 
writer output? 
 SDK writer is an independent entity, hence SDK writer can generate carbondata 
files from a non-cluster machine that has different time zones. But at cluster 
when those files are read, it always takes cluster time-zone. Hence, the value 
of timestamp and date datatype fields are not original value.
@@ -212,7 +213,22 @@ cluster timezone is Asia/Shanghai
 TimeZone.setDefault(TimeZone.getTimeZone("Asia/Shanghai"))
 ```
 
+## How to check LRU cache memory footprint?
+To observe the LRU cache memory footprint in the logs, configure the below 
properties in log4j.properties file.
+```
+log4j.logger.org.apache.carbondata.core.memory.UnsafeMemoryManager = DEBUG
+log4j.logger.org.apache.carbondata.core.cache.CarbonLRUCache = DEBUG
+```
+These properties will enable the DEBUG log for the CarbonLRUCache and 
UnsafeMemoryManager which will print the information of memory consumed using 
which the LRU cache size can be decided. **Note:** Enabling the DEBUG log will 
degrade the query performance.
 
+**Example:**
+```
+18/09/26 15:05:28 DEBUG UnsafeMemoryManager: pool-44-thread-1 Memory block 
(org.apache.carbondata.core.memory.MemoryBlock@21312095) is created with size 
10. Total memory used 413Bytes, left 536870499Bytes
+18/09/26 15:05:29 DEBUG CarbonLRUCache: main Required size for entry 
/home/target/store/default/stored_as_carbondata_table/Fact/Part0/Segment_0/0_1537954529044.carbonindexmerge
 :: 181 Current cache size :: 0
+18/09/26 15:05:30 DEBUG UnsafeMemoryManager: main Freeing memory of size: 
105available memory:  536870836
+18/09/26 15:05:30 DEBUG UnsafeMemoryManager: main Freeing memory of size: 
76available memory:  536870912
+18/09/26 15:05:30 INFO CarbonLRUCache: main Removed entry from InMemory lru 
cache :: 
/home/target/store/default/stored_as_carbondata_table/Fact/Part0/Segment_0/0_1537954529044.carbonindexmerge
+```
 
 ## Getting tablestatus.lock issues When loading data
 

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/file-structure-of-carbondata.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/file-structure-of-carbondata.md 
b/src/site/markdown/file-structure-of-carbondata.md
index 2b43105..8eacd38 100644
--- a/src/site/markdown/file-structure-of-carbondata.md
+++ b/src/site/markdown/file-structure-of-carbondata.md
@@ -48,7 +48,7 @@ The CarbonData files are stored in the location specified by 
the ***carbon.store
 
 ![File Directory Structure](../../src/site/images/2-1_1.png)
 
-1. ModifiedTime.mdt records the timestamp of the metadata with the 
modification time attribute of the file. When the drop table and create table 
are used, the modification time of the file is updated.This is common to all 
databases and hence is kept in parallel to databases
+1. ModifiedTime.mdt records the timestamp of the metadata with the 
modification time attribute of the file. When the drop table and create table 
are used, the modification time of the file is updated. This is common to all 
databases and hence is kept in parallel to databases
 2. The **default** is the database name and contains the user tables.default 
is used when user doesn't specify any database name;else user configured 
database name will be the directory name. user_table is the table name.
 3. Metadata directory stores schema files, tablestatus and dictionary files 
(including .dict, .dictmeta and .sortindex). There are three types of metadata 
data information files.
 4. data and index files are stored under directory named **Fact**. The Fact 
directory has a Part0 partition directory, where 0 is the partition number.

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/how-to-contribute-to-apache-carbondata.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/how-to-contribute-to-apache-carbondata.md 
b/src/site/markdown/how-to-contribute-to-apache-carbondata.md
index f64c948..8d6c891 100644
--- a/src/site/markdown/how-to-contribute-to-apache-carbondata.md
+++ b/src/site/markdown/how-to-contribute-to-apache-carbondata.md
@@ -48,7 +48,7 @@ alternatively, on the developer mailing 
list([email protected]).
 
 If thereâs an existing JIRA issue for your intended contribution, please 
comment about your
 intended work. Once the work is understood, a committer will assign the issue 
to you.
-(If you donât have a JIRA role yet, youâll be added to the 
âcontributorâ role.) If an issue is
+(If you donât have a JIRA role yet, youâll be added to the "contributor" 
role.) If an issue is
 currently assigned, please check with the current assignee before reassigning.
 
 For moderate or large contributions, you should not start coding or writing a 
design doc unless
@@ -171,7 +171,7 @@ Our GitHub mirror automatically provides pre-commit testing 
coverage using Jenki
 Please make sure those tests pass,the contribution cannot be merged otherwise.
 
 #### LGTM
-Once the reviewer is happy with the change, theyâll respond with an LGTM 
(âlooks good to me!â).
+Once the reviewer is happy with the change, theyâll respond with an LGTM 
("looks good to me!").
 At this point, the committer will take over, possibly make some additional 
touch ups,
 and merge your changes into the codebase.
 

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/introduction.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/introduction.md 
b/src/site/markdown/introduction.md
index 434ccfa..e6c3372 100644
--- a/src/site/markdown/introduction.md
+++ b/src/site/markdown/introduction.md
@@ -18,15 +18,15 @@ CarbonData has
 
 ## CarbonData Features & Functions
 
-CarbonData has rich set of featues to support various use cases in Big Data 
analytics.The below table lists the major features supported by CarbonData.
+CarbonData has rich set of features to support various use cases in Big Data 
analytics. The below table lists the major features supported by CarbonData.
 
 
 
 ### Table Management
 
 - ##### DDL (Create, Alter,Drop,CTAS)
-
-â    CarbonData provides its own DDL to create and manage carbondata 
tables.These DDL conform to                     Hive,Spark SQL format and 
support additional properties and configuration to take advantages of 
CarbonData functionalities.
+  
+  CarbonData provides its own DDL to create and manage carbondata tables. 
These DDL conform to Hive,Spark SQL format and support additional properties 
and configuration to take advantages of CarbonData functionalities.
 
 - ##### DML(Load,Insert)
 
@@ -46,7 +46,7 @@ CarbonData has rich set of featues to support various use 
cases in Big Data anal
 
 - ##### Compaction
 
-  CarbonData manages incremental loads as segments.Compaction help to compact 
the growing number of segments and also to improve query filter pruning.
+  CarbonData manages incremental loads as segments. Compaction helps to 
compact the growing number of segments and also to improve query filter pruning.
 
 - ##### External Tables
 
@@ -56,11 +56,11 @@ CarbonData has rich set of featues to support various use 
cases in Big Data anal
 
 - ##### Pre-Aggregate
 
-  CarbonData has concept of datamaps to assist in pruning of data while 
querying so that performance is faster.Pre Aggregate tables are kind of 
datamaps which can improve the query performance by order of 
magnitude.CarbonData will automatically pre-aggregae the incremental data and 
re-write the query to automatically fetch from the most appropriate 
pre-aggregate table to serve the query faster.
+  CarbonData has concept of datamaps to assist in pruning of data while 
querying so that performance is faster.Pre Aggregate tables are kind of 
datamaps which can improve the query performance by order of 
magnitude.CarbonData will automatically pre-aggregate the incremental data and 
re-write the query to automatically fetch from the most appropriate 
pre-aggregate table to serve the query faster.
 
 - ##### Time Series
 
-  CarbonData has built in understanding of time order(Year, month,day,hour, 
minute,second).Time series is a pre-aggregate table which can automatically 
roll-up the data to the desired level during incremental load and serve the 
query from the most appropriate pre-aggregate table.
+  CarbonData has built in understanding of time order(Year, month,day,hour, 
minute,second). Time series is a pre-aggregate table which can automatically 
roll-up the data to the desired level during incremental load and serve the 
query from the most appropriate pre-aggregate table.
 
 - ##### Bloom filter
 
@@ -72,7 +72,7 @@ CarbonData has rich set of featues to support various use 
cases in Big Data anal
 
 - ##### MV (Materialized Views)
 
-  MVs are kind of pre-aggregate tables which can support efficent query 
re-write and processing.CarbonData provides MV which can rewrite query to fetch 
from any table(including non-carbondata tables).Typical usecase is to store the 
aggregated data of a non-carbondata fact table into carbondata and use mv to 
rewrite the query to fetch from carbondata.
+  MVs are kind of pre-aggregate tables which can support efficent query 
re-write and processing.CarbonData provides MV which can rewrite query to fetch 
from any table(including non-carbondata tables). Typical usecase is to store 
the aggregated data of a non-carbondata fact table into carbondata and use mv 
to rewrite the query to fetch from carbondata.
 
 ### Streaming
 
@@ -84,17 +84,17 @@ CarbonData has rich set of featues to support various use 
cases in Big Data anal
 
 - ##### CarbonData writer
 
-  CarbonData supports writing data from non-spark application using SDK.Users 
can use SDK to generate carbondata files from custom applications.Typical 
usecase is to write the streaming application plugged in to kafka and use 
carbondata as sink(target) table for storing.
+  CarbonData supports writing data from non-spark application using SDK.Users 
can use SDK to generate carbondata files from custom applications. Typical 
usecase is to write the streaming application plugged in to kafka and use 
carbondata as sink(target) table for storing.
 
 - ##### CarbonData reader
 
-  CarbonData supports reading of data from non-spark application using 
SDK.Users can use the SDK to read the carbondata files from their application 
and do custom processing.
+  CarbonData supports reading of data from non-spark application using SDK. 
Users can use the SDK to read the carbondata files from their application and 
do custom processing.
 
 ### Storage
 
 - ##### S3
 
-  CarbonData can write to S3, OBS or any cloud storage confirming to S3 
protocol.CarbonData uses the HDFS api to write to cloud object stores.
+  CarbonData can write to S3, OBS or any cloud storage confirming to S3 
protocol. CarbonData uses the HDFS api to write to cloud object stores.
 
 - ##### HDFS
 

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/language-manual.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/language-manual.md 
b/src/site/markdown/language-manual.md
index 123cae3..79aad00 100644
--- a/src/site/markdown/language-manual.md
+++ b/src/site/markdown/language-manual.md
@@ -32,8 +32,10 @@ CarbonData has its own parser, in addition to Spark's SQL 
Parser, to parse and p
   - Materialized Views (MV)
   - [Streaming](./streaming-guide.md)
 - Data Manipulation Statements
-  - [DML:](./dml-of-carbondata.md) [Load](./dml-of-carbondata.md#load-data), 
[Insert](./ddl-of-carbondata.md#insert-overwrite), 
[Update](./dml-of-carbondata.md#update), [Delete](./dml-of-carbondata.md#delete)
+  - [DML:](./dml-of-carbondata.md) [Load](./dml-of-carbondata.md#load-data), 
[Insert](./dml-of-carbondata.md#insert-data-into-carbondata-table), 
[Update](./dml-of-carbondata.md#update), [Delete](./dml-of-carbondata.md#delete)
   - [Segment Management](./segment-management-on-carbondata.md)
+- [CarbonData as Spark's Datasource](./carbon-as-spark-datasource-guide.md)
 - [Configuration Properties](./configuration-parameters.md)
 
 
+

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/lucene-datamap-guide.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/lucene-datamap-guide.md 
b/src/site/markdown/lucene-datamap-guide.md
index 86b00e2..aa9c8d4 100644
--- a/src/site/markdown/lucene-datamap-guide.md
+++ b/src/site/markdown/lucene-datamap-guide.md
@@ -47,7 +47,7 @@ It will show all DataMaps created on main table.
 
 ## Lucene DataMap Introduction
   Lucene is a high performance, full featured text search engine. Lucene is 
integrated to carbon as
-  an index datamap and managed along with main tables by CarbonData.User can 
create lucene datamap 
+  an index datamap and managed along with main tables by CarbonData. User can 
create lucene datamap 
   to improve query performance on string columns which has content of more 
length. So, user can 
   search tokenized word or pattern of it using lucene query on text content.
   
@@ -95,7 +95,7 @@ As a technique for query acceleration, Lucene indexes cannot 
be queried directly
 Queries are to be made on main table. when a query with TEXT_MATCH('name:c10') 
or 
 TEXT_MATCH_WITH_LIMIT('name:n10',10)[the second parameter represents the 
number of result to be 
 returned, if user does not specify this value, all results will be returned 
without any limit] is 
-fired, two jobs are fired.The first job writes the temporary files in folder 
created at table level 
+fired, two jobs are fired. The first job writes the temporary files in folder 
created at table level 
 which contains lucene's seach results and these files will be read in second 
job to give faster 
 results. These temporary files will be cleared once the query finishes.
 

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/performance-tuning.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/performance-tuning.md 
b/src/site/markdown/performance-tuning.md
index f56a63b..6c87ce9 100644
--- a/src/site/markdown/performance-tuning.md
+++ b/src/site/markdown/performance-tuning.md
@@ -140,7 +140,7 @@
 
 | Parameter | Default Value | Description/Tuning |
 |-----------|-------------|--------|
-|carbon.number.of.cores.while.loading|Default: 2.This value should be >= 
2|Specifies the number of cores used for data processing during data loading in 
CarbonData. |
+|carbon.number.of.cores.while.loading|Default: 2. This value should be >= 
2|Specifies the number of cores used for data processing during data loading in 
CarbonData. |
 |carbon.sort.size|Default: 100000. The value should be >= 100.|Threshold to 
write local file in sort step when loading data|
 |carbon.sort.file.write.buffer.size|Default:  50000.|DataOutputStream buffer. |
 |carbon.merge.sort.reader.thread|Default: 3 |Specifies the number of cores 
used for temp file merging during data loading in CarbonData.|
@@ -165,15 +165,15 @@
 
|----------------------------------------------|-----------------------------------|---------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | carbon.sort.intermediate.files.limit | spark/carbonlib/carbon.properties | 
Data loading | During the loading of data, local temp is used to sort the data. 
This number specifies the minimum number of intermediate files after which the  
merge sort has to be initiated. | Increasing the parameter to a higher value 
will improve the load performance. For example, when we increase the value from 
20 to 100, it increases the data load performance from 35MB/S to more than 
50MB/S. Higher values of this parameter consumes  more memory during the load. |
 | carbon.number.of.cores.while.loading | spark/carbonlib/carbon.properties | 
Data loading | Specifies the number of cores used for data processing during 
data loading in CarbonData. | If you have more number of CPUs, then you can 
increase the number of CPUs, which will increase the performance. For example 
if we increase the value from 2 to 4 then the CSV reading performance can 
increase about 1 times |
-| carbon.compaction.level.threshold | spark/carbonlib/carbon.properties | Data 
loading and Querying | For minor compaction, specifies the number of segments 
to be merged in stage 1 and number of compacted segments to be merged in stage 
2. | Each CarbonData load will create one segment, if every load is small in 
size it will generate many small file over a period of time impacting the query 
performance. Configuring this parameter will merge the small segment to one big 
segment which will sort the data and improve the performance. For Example in 
one telecommunication scenario, the performance improves about 2 times after 
minor compaction. |
+| carbon.compaction.level.threshold | spark/carbonlib/carbon.properties | Data 
loading and Querying | For minor compaction, specifies the number of segments 
to be merged in stage 1 and number of compacted segments to be merged in stage 
2. | Each CarbonData load will create one segment, if every load is small in 
size it will generate many small files over a period of time impacting the 
query performance. Configuring this parameter will merge the small segment to 
one big segment which will sort the data and improve the performance. For 
Example in one telecommunication scenario, the performance improves about 2 
times after minor compaction. |
 | spark.sql.shuffle.partitions | spark/conf/spark-defaults.conf | Querying | 
The number of task started when spark shuffle. | The value can be 1 to 2 times 
as much as the executor cores. In an aggregation scenario, reducing the number 
from 200 to 32 reduced the query time from 17 to 9 seconds. |
 | spark.executor.instances/spark.executor.cores/spark.executor.memory | 
spark/conf/spark-defaults.conf | Querying | The number of executors, CPU cores, 
and memory used for CarbonData query. | In the bank scenario, we provide the 4 
CPUs cores and 15 GB for each executor which can get good performance. This 2 
value does not mean more the better. It needs to be configured properly in case 
of limited resources. For example, In the bank scenario, it has enough CPU 32 
cores each node but less memory 64 GB each node. So we cannot give more CPU but 
less memory. For example, when 4 cores and 12GB for each executor. It sometimes 
happens GC during the query which impact the query performance very much from 
the 3 second to more than 15 seconds. In this scenario need to increase the 
memory or decrease the CPU cores. |
 | carbon.detail.batch.size | spark/carbonlib/carbon.properties | Data loading 
| The buffer size to store records, returned from the block scan. | In limit 
scenario this parameter is very important. For example your query limit is 
1000. But if we set this value to 3000 that means we get 3000 records from scan 
but spark will only take 1000 rows. So the 2000 remaining are useless. In one 
Finance test case after we set it to 100, in the limit 1000 scenario the 
performance increase about 2 times in comparison to if we set this value to 
12000. |
 | carbon.use.local.dir | spark/carbonlib/carbon.properties | Data loading | 
Whether use YARN local directories for multi-table load disk load balance | If 
this is set it to true CarbonData will use YARN local directories for 
multi-table load disk load balance, that will improve the data load 
performance. |
 | carbon.use.multiple.temp.dir | spark/carbonlib/carbon.properties | Data 
loading | Whether to use multiple YARN local directories during table data 
loading for disk load balance | After enabling 'carbon.use.local.dir', if this 
is set to true, CarbonData will use all YARN local directories during data load 
for disk load balance, that will improve the data load performance. Please 
enable this property when you encounter disk hotspot problem during data 
loading. |
-| carbon.sort.temp.compressor | spark/carbonlib/carbon.properties | Data 
loading | Specify the name of compressor to compress the intermediate sort 
temporary files during sort procedure in data loading. | The optional values 
are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD' and empty. By default, empty means 
that Carbondata will not compress the sort temp files. This parameter will be 
useful if you encounter disk bottleneck. |
-| carbon.load.skewedDataOptimization.enabled | 
spark/carbonlib/carbon.properties | Data loading | Whether to enable size based 
block allocation strategy for data loading. | When loading, carbondata will use 
file size based block allocation strategy for task distribution. It will make 
sure that all the executors process the same size of data -- It's useful if the 
size of your input data files varies widely, say 1MB~1GB. |
-| carbon.load.min.size.enabled | spark/carbonlib/carbon.properties | Data 
loading | Whether to enable node minumun input data size allocation strategy 
for data loading.| When loading, carbondata will use node minumun input data 
size allocation strategy for task distribution. It will make sure the node load 
the minimum amount of data -- It's useful if the size of your input data files 
very small, say 1MB~256MB,Avoid generating a large number of small files. |
+| carbon.sort.temp.compressor | spark/carbonlib/carbon.properties | Data 
loading | Specify the name of compressor to compress the intermediate sort 
temporary files during sort procedure in data loading. | The optional values 
are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD', and empty. By default, empty means 
that Carbondata will not compress the sort temp files. This parameter will be 
useful if you encounter disk bottleneck. |
+| carbon.load.skewedDataOptimization.enabled | 
spark/carbonlib/carbon.properties | Data loading | Whether to enable size based 
block allocation strategy for data loading. | When loading, carbondata will use 
file size based block allocation strategy for task distribution. It will make 
sure that all the executors process the same size of data -- It's useful if the 
size of your input data files varies widely, say 1MB to 1GB. |
+| carbon.load.min.size.enabled | spark/carbonlib/carbon.properties | Data 
loading | Whether to enable node minumun input data size allocation strategy 
for data loading.| When loading, carbondata will use node minumun input data 
size allocation strategy for task distribution. It will make sure the nodes 
load the minimum amount of data -- It's useful if the size of your input data 
files very small, say 1MB to 256MB,Avoid generating a large number of small 
files. |
 
   Note: If your CarbonData instance is provided only for query, you may 
specify the property 'spark.speculation=true' which is in conf directory of 
spark.
 

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/preaggregate-datamap-guide.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/preaggregate-datamap-guide.md 
b/src/site/markdown/preaggregate-datamap-guide.md
index 3a3efc2..eff601d 100644
--- a/src/site/markdown/preaggregate-datamap-guide.md
+++ b/src/site/markdown/preaggregate-datamap-guide.md
@@ -251,7 +251,7 @@ pre-aggregate tables. To further improve the query 
performance, compaction on pr
 can be triggered to merge the segments and files in the pre-aggregate tables. 
 
 ## Data Management with pre-aggregate tables
-In current implementation, data consistence need to be maintained for both 
main table and pre-aggregate
+In current implementation, data consistency needs to be maintained for both 
main table and pre-aggregate
 tables. Once there is pre-aggregate table created on the main table, following 
command on the main 
 table
 is not supported:

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/quick-start-guide.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/quick-start-guide.md 
b/src/site/markdown/quick-start-guide.md
index 37c398c..fd535ae 100644
--- a/src/site/markdown/quick-start-guide.md
+++ b/src/site/markdown/quick-start-guide.md
@@ -16,7 +16,7 @@
 -->
 
 # Quick Start
-This tutorial provides a quick introduction to using CarbonData.To follow 
along with this guide, first download a packaged release of CarbonData from the 
[CarbonData 
website](https://dist.apache.org/repos/dist/release/carbondata/).Alternatively 
it can be created following [Building 
CarbonData](https://github.com/apache/carbondata/tree/master/build) steps.
+This tutorial provides a quick introduction to using CarbonData. To follow 
along with this guide, first download a packaged release of CarbonData from the 
[CarbonData 
website](https://dist.apache.org/repos/dist/release/carbondata/).Alternatively 
it can be created following [Building 
CarbonData](https://github.com/apache/carbondata/tree/master/build) steps.
 
 ##  Prerequisites
 * CarbonData supports Spark versions upto 2.2.1.Please download Spark package 
from [Spark website](https://spark.apache.org/downloads.html)
@@ -35,7 +35,7 @@ This tutorial provides a quick introduction to using 
CarbonData.To follow along
 
 ## Integration
 
-CarbonData can be integrated with Spark and Presto Execution Engines.The below 
documentation guides on Installing and Configuring with these execution engines.
+CarbonData can be integrated with Spark and Presto Execution Engines. The 
below documentation guides on Installing and Configuring with these execution 
engines.
 
 ### Spark
 
@@ -293,31 +293,31 @@ hdfs://<host_name>:port/user/hive/warehouse/carbon.store
 
 ## Installing and Configuring CarbonData on Presto
 
-**NOTE:** **CarbonData tables cannot be created nor loaded from Presto.User 
need to create CarbonData Table and load data into it
+**NOTE:** **CarbonData tables cannot be created nor loaded from Presto. User 
need to create CarbonData Table and load data into it
 either with 
[Spark](#installing-and-configuring-carbondata-to-run-locally-with-spark-shell) 
or [SDK](./sdk-guide.md).
 Once the table is created,it can be queried from Presto.**
 
 
 ### Installing Presto
 
- 1. Download the 0.187 version of Presto using:
-    `wget 
https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.187/presto-server-0.187.tar.gz`
+ 1. Download the 0.210 version of Presto using:
+    `wget 
https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.210/presto-server-0.210.tar.gz`
 
- 2. Extract Presto tar file: `tar zxvf presto-server-0.187.tar.gz`.
+ 2. Extract Presto tar file: `tar zxvf presto-server-0.210.tar.gz`.
 
  3. Download the Presto CLI for the coordinator and name it presto.
 
   ```
-    wget 
https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.187/presto-cli-0.187-executable.jar
+    wget 
https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.210/presto-cli-0.210-executable.jar
 
-    mv presto-cli-0.187-executable.jar presto
+    mv presto-cli-0.210-executable.jar presto
 
     chmod +x presto
   ```
 
 ### Create Configuration Files
 
-  1. Create `etc` folder in presto-server-0.187 directory.
+  1. Create `etc` folder in presto-server-0.210 directory.
   2. Create `config.properties`, `jvm.config`, `log.properties`, and 
`node.properties` files.
   3. Install uuid to generate a node.id.
 
@@ -363,10 +363,15 @@ Once the table is created,it can be queried from Presto.**
   coordinator=true
   node-scheduler.include-coordinator=false
   http-server.http.port=8086
-  query.max-memory=50GB
-  query.max-memory-per-node=2GB
+  query.max-memory=5GB
+  query.max-total-memory-per-node=5GB
+  query.max-memory-per-node=3GB
+  memory.heap-headroom-per-node=1GB
   discovery-server.enabled=true
-  discovery.uri=<coordinator_ip>:8086
+  discovery.uri=http://localhost:8086
+  task.max-worker-threads=4
+  optimizer.dictionary-aggregation=true
+  optimizer.optimize-hash-generation = false
   ```
 The options `node-scheduler.include-coordinator=false` and `coordinator=true` 
indicate that the node is the coordinator and tells the coordinator not to do 
any of the computation work itself and to use the workers.
 
@@ -383,7 +388,7 @@ Then, `query.max-memory=<30GB * number of nodes>`.
   ```
   coordinator=false
   http-server.http.port=8086
-  query.max-memory=50GB
+  query.max-memory=5GB
   query.max-memory-per-node=2GB
   discovery.uri=<coordinator_ip>:8086
   ```
@@ -405,12 +410,12 @@ Then, `query.max-memory=<30GB * number of nodes>`.
 ### Start Presto Server on all nodes
 
 ```
-./presto-server-0.187/bin/launcher start
+./presto-server-0.210/bin/launcher start
 ```
 To run it as a background process.
 
 ```
-./presto-server-0.187/bin/launcher run
+./presto-server-0.210/bin/launcher run
 ```
 To run it in foreground.
 

http://git-wip-us.apache.org/repos/asf/carbondata-site/blob/6ad7599a/src/site/markdown/s3-guide.md
----------------------------------------------------------------------
diff --git a/src/site/markdown/s3-guide.md b/src/site/markdown/s3-guide.md
index a2e5f07..1121164 100644
--- a/src/site/markdown/s3-guide.md
+++ b/src/site/markdown/s3-guide.md
@@ -15,7 +15,7 @@
     limitations under the License.
 -->
 
-# S3 Guide (Alpha Feature 1.4.1)
+# S3 Guide
 
 Object storage is the recommended storage format in cloud as it can support 
storing large data 
 files. S3 APIs are widely used for accessing object stores. This can be

[02/20] carbondata-site git commit: modified for 1.5.0 version

Reply via email to