This is an automated email from the ASF dual-hosted git repository.
jackylk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git
The following commit(s) were added to refs/heads/master by this push:
new 7011cf3 [CARBONDATA-3717] Fix inconsistent configs in docs
7011cf3 is described below
commit 7011cf38ad6b51d4fc60a5bd1cbca0e062e2adc8
Author: 勉一 <[email protected]>
AuthorDate: Fri Feb 21 19:35:30 2020 +0800
[CARBONDATA-3717] Fix inconsistent configs in docs
Why is this PR needed?
Now there are more and more configs in CarbonData(maybe is too many that is
hard to maintain).
I found a lot of confusing configs when I was using Carbon:
- `table_block_size` -> `table_blocksize`
- `sort.inmemory.size.in.mb` -> `sort.inmemory.size.inmb`
- unused config(useless):
- carbon.number.of.cores
- carbon.graph.rowset.size
- carbon.enableXXHash
- ....
What changes were proposed in this PR?
Fix wrong config docs;
Remove unused/meaningless config docs;
Does this PR introduce any user interface change?
No
Is any new testcase added?
No
This closes #3632
---
conf/carbon.properties.template | 2 --
.../carbondata/core/constants/CarbonCommonConstants.java | 11 -----------
docs/carbon-as-spark-datasource-guide.md | 2 +-
docs/usecases.md | 4 +---
...\230DB\346\200\247\350\203\275\345\257\271\346\257\224.md" | 1 -
5 files changed, 2 insertions(+), 18 deletions(-)
diff --git a/conf/carbon.properties.template b/conf/carbon.properties.template
index 1d5331c..eb635d6 100644
--- a/conf/carbon.properties.template
+++ b/conf/carbon.properties.template
@@ -33,8 +33,6 @@ carbon.sort.file.buffer.size=10
carbon.number.of.cores.while.loading=2
#Record count to sort and write to temp intermediate files
carbon.sort.size=100000
-#Algorithm for hashmap for hashkey calculation
-carbon.enableXXHash=true
#enable prefetch of data during merge sort while reading data from sort temp
files in data loading
#carbon.merge.sort.prefetch=true
diff --git
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
index ef87011..d8194a3 100644
---
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
+++
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
@@ -205,17 +205,6 @@ public final class CarbonCommonConstants {
public static final String ZOOKEEPER_LOCATION = "/CarbonLocks";
/**
- * xxhash algorithm property for hashmap
- */
- @CarbonProperty
- public static final String ENABLE_XXHASH = "carbon.enableXXHash";
-
- /**
- * xxhash algorithm property for hashmap Default value false
- */
- public static final String ENABLE_XXHASH_DEFAULT = "true";
-
- /**
* System property to enable or disable local dictionary generation
*/
@CarbonProperty
diff --git a/docs/carbon-as-spark-datasource-guide.md
b/docs/carbon-as-spark-datasource-guide.md
index b61bf43..275d5b1 100644
--- a/docs/carbon-as-spark-datasource-guide.md
+++ b/docs/carbon-as-spark-datasource-guide.md
@@ -55,7 +55,7 @@ Now you can create Carbon table using Spark's datasource DDL
syntax.
## Example
```
- CREATE TABLE CARBON_TABLE (NAME STRING) USING CARBON
OPTIONS('table_block_size'='256')
+ CREATE TABLE CARBON_TABLE (NAME STRING) USING CARBON
OPTIONS('table_blocksize'='256')
```
# Using DataFrame
diff --git a/docs/usecases.md b/docs/usecases.md
index 343fccd..ec07ff3 100644
--- a/docs/usecases.md
+++ b/docs/usecases.md
@@ -83,7 +83,6 @@ Apart from these, the following CarbonData configuration was
suggested to be con
| Configuration for | Parameter | Value |
Description |
|------------------ | --------------------------------------- | ------ |
----------- |
-| Data Loading | carbon.graph.rowset.size | 100000 | Based on
the size of each row, this determines the memory required during data
loading.Higher value leads to increased memory foot print |
| Data Loading | carbon.number.of.cores.while.loading | 12 | More cores
can improve data loading speed |
| Data Loading | carbon.sort.size | 100000 | Number of
records to sort at a time.More number of records configured will lead to
increased memory foot print |
| Data Loading | table_blocksize | 256 | To
efficiently schedule multiple tasks during query |
@@ -134,7 +133,6 @@ Use all columns are no-dictionary as the cardinality is
high.
| Configuration for | Parameter | Value
| Description |
| ------------------| --------------------------------------- |
----------------------- | ------------------|
-| Data Loading | carbon.graph.rowset.size | 100000
| Based on the size of each row, this determines the memory required during
data loading.Higher value leads to increased memory foot print |
| Data Loading | enable.unsafe.sort | TRUE
| Temporary data generated during sort is huge which causes GC bottlenecks.
Using unsafe reduces the pressure on GC |
| Data Loading | enable.offheap.sort | TRUE
| Temporary data generated during sort is huge which causes GC bottlenecks.
Using offheap reduces the pressure on GC.offheap can be accessed through java
unsafe.hence enable.unsafe.sort needs to be true |
| Data Loading | offheap.sort.chunk.size.in.mb | 128
| Size of memory to allocate for sorting.Can increase this based on the
memory available |
@@ -143,7 +141,7 @@ Use all columns are no-dictionary as the cardinality is
high.
| Data Loading | table_blocksize | 512
| To efficiently schedule multiple tasks during query. This size depends on
data scenario.If data is such that the filters would select less number of
blocklets to scan, keeping higher number works well.If the number blocklets to
scan is more, better to reduce the size as more tasks can be scheduled in
parallel. |
| Data Loading | carbon.sort.intermediate.files.limit | 100
| Increased to 100 as number of cores are more.Can perform merging in
backgorund.If less number of files to merge, sort threads would be idle |
| Data Loading | carbon.use.local.dir | TRUE
| yarn application directory will be usually on a single disk.YARN would be
configured with multiple disks to be used as temp or to assign randomly to
applications. Using the yarn temp directory will allow carbon to use multiple
disks and improve IO performance |
-| Data Loading | sort.inmemory.size.in.mb | 92160 | Memory
allocated to do inmemory sorting. When more memory is available in the node,
configuring this will retain more sort blocks in memory so that the merge sort
is faster due to no/very less IO |
+| Data Loading | sort.inmemory.size.inmb | 92160 | Memory
allocated to do inmemory sorting. When more memory is available in the node,
configuring this will retain more sort blocks in memory so that the merge sort
is faster due to no/very less IO |
| Compaction | carbon.major.compaction.size | 921600
| Sum of several loads to combine into single segment |
| Compaction | carbon.number.of.cores.while.compacting | 12
| Higher number of cores can improve the compaction speed.Data size is
huge.Compaction need to use more threads to speed up the process |
| Compaction | carbon.enable.auto.load.merge | FALSE
| Doing auto minor compaction is costly process as data size is huge.Perform
manual compaction when the cluster is less loaded |
diff --git
"a/docs/zh_cn/CarbonData\344\270\216\345\225\206\344\270\232\345\210\227\345\255\230DB\346\200\247\350\203\275\345\257\271\346\257\224.md"
"b/docs/zh_cn/CarbonData\344\270\216\345\225\206\344\270\232\345\210\227\345\255\230DB\346\200\247\350\203\275\345\257\271\346\257\224.md"
index 39b69f2..ee58282 100644
---
"a/docs/zh_cn/CarbonData\344\270\216\345\225\206\344\270\232\345\210\227\345\255\230DB\346\200\247\350\203\275\345\257\271\346\257\224.md"
+++
"b/docs/zh_cn/CarbonData\344\270\216\345\225\206\344\270\232\345\210\227\345\255\230DB\346\200\247\350\203\275\345\257\271\346\257\224.md"
@@ -89,7 +89,6 @@ LIMIT 5000
| CarbonData主要配置 | 参数值 | 描述
|
| ------------------------------------ | ------ |
------------------------------------------------------------ |
| carbon.inmemory.record.size | 480000 | 查询每个表需要加载到内存的总行数。
|
-| carbon.number.of.cores | 4 | carbon查询过程中并行扫描的线程数。
|
| carbon.number.of.cores.while.loading | 15 | carbon数据加载过程中并行扫描的线程数。
|
| carbon.sort.file.buffer.size | 20 |
在合并排序(读/写)操作时存储每个临时过程文件的所使用的总缓存大小。单位为MB |
| carbon.sort.size | 500000 | 在数据加载操作时,每次被排序的记录数。
|