Github user xuchuanyin commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2520#discussion_r204969813
--- Diff: docs/data-management-on-carbondata.md ---
@@ -124,6 +124,41 @@ This tutorial is going to introduce all commands and
data operations on CarbonDa
TBLPROPERTIES ('streaming'='true')
```
+ - **Local Dictionary Configuration**
+
+ Local Dictionary is generated only for no-dictionary string/varchar
datatype columns. It helps in:
+ 1. Getting more compression on dimension columns with less cardinality.
+ 2. Filter queries and full scan queries on No-dictionary columns with
local dictionary will be faster as filter will be done on encoded data.
+ 3. Reducing the store size and memory footprint as only unique values
will be stored as part of local dictionary and corresponding data will be
stored as encoded data.
+
+ By default, Local Dictionary will be enabled and generated for all
no-dictionary string/varchar datatype columns.
+
+ Users will be able to pass following properties in create table
command:
+
+ | Properties | Default value | Description |
+ | ---------- | ------------- | ----------- |
+ | LOCAL_DICTIONARY_ENABLE | true | By default, local dictionary
will be enabled for the table |
+ | LOCAL_DICTIONARY_THRESHOLD | 10000 | The maximum cardinality for
local dictionary generation (range- 1000 to 100000) |
+ | LOCAL_DICTIONARY_INCLUDE | all no-dictionary string/varchar
columns | Columns for which Local Dictionary is generated. |
+ | LOCAL_DICTIONARY_EXCLUDE | none | Columns for which Local
Dictionary is not generated |
+
--- End diff --
What about the limitations? Such as, can local dictionary columns work with:
1. sort_columns?
2. dictionary include?
3. complex?
3. etc.
---