[09/45] carbondata git commit: [CARBONDATA-2973] Added documentation for fallback condition for complex columns in local Dictionary

ravipesala Tue, 09 Oct 2018 08:50:33 -0700

[CARBONDATA-2973] Added documentation for fallback condition for complex 
columns in local Dictionary


1. Added documentation for fallback condition for complex columns in local 
Dictionary
2. Added documentation for system level property" 
carbon.local.dictionary.decoder.fallback"

This closes #2766


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/3f99e9b7
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/3f99e9b7
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/3f99e9b7

Branch: refs/heads/branch-1.5
Commit: 3f99e9b7f87b387f03c1111b5bece2b2a8c5a50b
Parents: a9ddfbd
Author: praveenmeenakshi56 <[email protected]>
Authored: Wed Sep 26 12:40:37 2018 +0530
Committer: manishgupta88 <[email protected]>
Committed: Wed Sep 26 18:14:44 2018 +0530

----------------------------------------------------------------------
 docs/configuration-parameters.md |  2 +-
 docs/ddl-of-carbondata.md        | 16 +++++++++++-----
 docs/faq.md                      |  2 +-
 3 files changed, 13 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/3f99e9b7/docs/configuration-parameters.md
----------------------------------------------------------------------
diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md
index 7edae47..662525b 100644
--- a/docs/configuration-parameters.md
+++ b/docs/configuration-parameters.md
@@ -119,7 +119,7 @@ This section provides the details of all the configurations 
required for the Car
 
 | Parameter | Default Value | Description |
 
|--------------------------------------|---------------|---------------------------------------------------|
-| carbon.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto 
which the driver process can cache the data (BTree and dictionary values). 
Beyond this, least recently used data will be removed from cache before loading 
new set of values.Default value of -1 means there is no memory limit for 
caching. Only integer values greater than 0 are accepted.**NOTE:** Minimum 
number of entries that needs to be removed from cache in order to load the new 
set of data is determined and unloaded.ie.,for example if 3 cache entries 
qualify for pre-emption, out of these, those entries that free up more cache 
memory is removed prior to others. Please refer 
[FAQs](./faq.md#how-to-check-LRU-cache-memory-footprint) for checking LRU cache 
memory footprint. |
+| carbon.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto 
which the driver process can cache the data (BTree and dictionary values). 
Beyond this, least recently used data will be removed from cache before loading 
new set of values.Default value of -1 means there is no memory limit for 
caching. Only integer values greater than 0 are accepted.**NOTE:** Minimum 
number of entries that needs to be removed from cache in order to load the new 
set of data is determined and unloaded.ie.,for example if 3 cache entries 
qualify for pre-emption, out of these, those entries that free up more cache 
memory is removed prior to others. Please refer 
[FAQs](./faq.md#how-to-check-lru-cache-memory-footprint) for checking LRU cache 
memory footprint. |
 | carbon.max.executor.lru.cache.size | -1 | Maximum memory **(in MB)** upto 
which the executor process can cache the data (BTree and reverse dictionary 
values).Default value of -1 means there is no memory limit for caching. Only 
integer values greater than 0 are accepted.**NOTE:** If this parameter is not 
configured, then the value of ***carbon.max.driver.lru.cache.size*** will be 
used. |
 | max.query.execution.time | 60 | Maximum time allowed for one query to be 
executed. The value is in minutes. |
 | carbon.enableMinMax | true | CarbonData maintains the metadata which enables 
to prune unnecessary files from being scanned as per the query conditions.To 
achieve pruning, Min,Max of each column is maintined.Based on the filter 
condition in the query, certain data can be skipped from scanning by matching 
the filter value against the min,max values of the column(s) present in that 
carbondata file.This pruing enhances query performance significantly. |

http://git-wip-us.apache.org/repos/asf/carbondata/blob/3f99e9b7/docs/ddl-of-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/ddl-of-carbondata.md b/docs/ddl-of-carbondata.md
index 2a467a2..5eeba86 100644
--- a/docs/ddl-of-carbondata.md
+++ b/docs/ddl-of-carbondata.md
@@ -231,7 +231,13 @@ CarbonData DDL statements are documented here,which 
includes:
    
    * In case of multi-level complex dataType columns, primitive 
string/varchar/char columns are considered for local dictionary generation.
 
-   Local dictionary will have to be enabled explicitly during create table or 
by enabling the **system property** ***carbon.local.dictionary.enable***. By 
default, Local Dictionary will be disabled for the carbondata table.
+   System Level Properties for Local Dictionary: 
+   
+   
+   | Properties | Default value | Description |
+   | ---------- | ------------- | ----------- |
+   | carbon.local.dictionary.enable | false | By default, Local Dictionary 
will be disabled for the carbondata table. |
+   | carbon.local.dictionary.decoder.fallback | true | Page Level data will 
not be maintained for the blocklet. During fallback, actual data will be 
retrieved from the encoded page data using local dictionary. **NOTE:** Memory 
footprint decreases significantly as compared to when this property is set to 
false |
     
    Local Dictionary can be configured using the following properties during 
create table command: 
           
@@ -239,24 +245,24 @@ CarbonData DDL statements are documented here,which 
includes:
 | Properties | Default value | Description |
 | ---------- | ------------- | ----------- |
 | LOCAL_DICTIONARY_ENABLE | false | Whether to enable local dictionary 
generation. **NOTE:** If this property is defined, it will override the value 
configured at system level by '***carbon.local.dictionary.enable***'.Local 
dictionary will be generated for all string/varchar/char columns unless 
LOCAL_DICTIONARY_INCLUDE, LOCAL_DICTIONARY_EXCLUDE is configured. |
-| LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality of a column 
upto which carbondata can try to generate local dictionary (maximum - 100000) |
+| LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality of a column 
upto which carbondata can try to generate local dictionary (maximum - 100000). 
**NOTE:** When LOCAL_DICTIONARY_THRESHOLD is defined for Complex columns, the 
count of distinct records of all child columns are summed up. |
 | LOCAL_DICTIONARY_INCLUDE | string/varchar/char columns| Columns for which 
Local Dictionary has to be generated.**NOTE:** Those string/varchar/char 
columns which are added into DICTIONARY_INCLUDE option will not be considered 
for local dictionary generation.This property needs to be configured only when 
local dictionary needs to be generated for few columns, skipping others.This 
property takes effect only when **LOCAL_DICTIONARY_ENABLE** is true or 
**carbon.local.dictionary.enable** is true |
 | LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local Dictionary need 
not be generated.This property needs to be configured only when local 
dictionary needs to be skipped for few columns, generating for others.This 
property takes effect only when **LOCAL_DICTIONARY_ENABLE** is true or 
**carbon.local.dictionary.enable** is true |
 
    **Fallback behavior:** 
 
    * When the cardinality of a column exceeds the threshold, it triggers a 
fallback and the generated dictionary will be reverted and data loading will be 
continued without dictionary encoding.
+   
+   * In case of complex columns, fallback is triggered when the summation 
value of all child columns' distinct records exceeds the defined 
LOCAL_DICTIONARY_THRESHOLD value.
 
    **NOTE:** When fallback is triggered, the data loading performance will 
decrease as encoded data will be discarded and the actual data is written to 
the temporary sort files.
 
    **Points to be noted:**
 
-   1. Reduce Block size:
+   * Reduce Block size:
    
       Number of Blocks generated is less in case of Local Dictionary as 
compression ratio is high. This may reduce the number of tasks launched during 
query, resulting in degradation of query performance if the pruned blocks are 
less compared to the number of parallel tasks which can be run. So it is 
recommended to configure smaller block size which in turn generates more number 
of blocks.
       
-   2. All the page-level data for a blocklet needs to be maintained in memory 
until all the pages encoded for local dictionary is processed in order to 
handle fallback. Hence the memory required for local dictionary based table is 
more and this memory increase is proportional to number of columns. 
-      
 ### Example:
 
    ```

http://git-wip-us.apache.org/repos/asf/carbondata/blob/3f99e9b7/docs/faq.md
----------------------------------------------------------------------
diff --git a/docs/faq.md b/docs/faq.md
index dbf9155..3dee5a2 100644
--- a/docs/faq.md
+++ b/docs/faq.md
@@ -28,7 +28,7 @@
 * [Why aggregate query is not fetching data from aggregate 
table?](#why-aggregate-query-is-not-fetching-data-from-aggregate-table)
 * [Why all executors are showing success in Spark UI even after Dataload 
command failed at Driver 
side?](#why-all-executors-are-showing-success-in-spark-ui-even-after-dataload-command-failed-at-driver-side)
 * [Why different time zone result for select query output when query SDK 
writer 
output?](#why-different-time-zone-result-for-select-query-output-when-query-sdk-writer-output)
-* [How to check LRU cache memory 
footprint?](#how-to-check-LRU-cache-memory-footprint)
+* [How to check LRU cache memory 
footprint?](#how-to-check-lru-cache-memory-footprint)
 
 # TroubleShooting

[09/45] carbondata git commit: [CARBONDATA-2973] Added documentation for fallback condition for complex columns in local Dictionary

Reply via email to