[carbondata] branch master updated: [CARBONDATA-3717] Fix inconsistent configs in docs

jackylk Sat, 22 Feb 2020 04:56:10 -0800

This is an automated email from the ASF dual-hosted git repository.

jackylk pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git



The following commit(s) were added to refs/heads/master by this push:
     new 7011cf3  [CARBONDATA-3717] Fix inconsistent configs in docs
7011cf3 is described below

commit 7011cf38ad6b51d4fc60a5bd1cbca0e062e2adc8
Author: 勉一 <shuming....@antfin.com>
AuthorDate: Fri Feb 21 19:35:30 2020 +0800

    [CARBONDATA-3717] Fix inconsistent configs in docs
    
    Why is this PR needed?
    Now there are more and more configs in CarbonData(maybe is too many that is 
hard to maintain).
    
    I found a lot of confusing configs when I was using Carbon:
    
    - `table_block_size` -> `table_blocksize`
    - `sort.inmemory.size.in.mb` -> `sort.inmemory.size.inmb`
    - unused config(useless):
      - carbon.number.of.cores
      - carbon.graph.rowset.size
      - carbon.enableXXHash
      - ....
    What changes were proposed in this PR?
    Fix wrong config docs;
    Remove unused/meaningless config docs;
    
    Does this PR introduce any user interface change?
    No
    
    Is any new testcase added?
    No
    
    This closes #3632
---
 conf/carbon.properties.template                               |  2 --
 .../carbondata/core/constants/CarbonCommonConstants.java      | 11 -----------
 docs/carbon-as-spark-datasource-guide.md                      |  2 +-
 docs/usecases.md                                              |  4 +---
 ...\230DB\346\200\247\350\203\275\345\257\271\346\257\224.md" |  1 -
 5 files changed, 2 insertions(+), 18 deletions(-)

diff --git a/conf/carbon.properties.template b/conf/carbon.properties.template
index 1d5331c..eb635d6 100644
--- a/conf/carbon.properties.template
+++ b/conf/carbon.properties.template
@@ -33,8 +33,6 @@ carbon.sort.file.buffer.size=10
 carbon.number.of.cores.while.loading=2
 #Record count to sort and write to temp intermediate files
 carbon.sort.size=100000
-#Algorithm for hashmap for hashkey calculation
-carbon.enableXXHash=true
 #enable prefetch of data during merge sort while reading data from sort temp 
files in data loading
 #carbon.merge.sort.prefetch=true
 
diff --git 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
index ef87011..d8194a3 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
@@ -205,17 +205,6 @@ public final class CarbonCommonConstants {
   public static final String ZOOKEEPER_LOCATION = "/CarbonLocks";
 
   /**
-   * xxhash algorithm property for hashmap
-   */
-  @CarbonProperty
-  public static final String ENABLE_XXHASH = "carbon.enableXXHash";
-
-  /**
-   * xxhash algorithm property for hashmap Default value false
-   */
-  public static final String ENABLE_XXHASH_DEFAULT = "true";
-
-  /**
    * System property to enable or disable local dictionary generation
    */
   @CarbonProperty
diff --git a/docs/carbon-as-spark-datasource-guide.md 
b/docs/carbon-as-spark-datasource-guide.md
index b61bf43..275d5b1 100644
--- a/docs/carbon-as-spark-datasource-guide.md
+++ b/docs/carbon-as-spark-datasource-guide.md
@@ -55,7 +55,7 @@ Now you can create Carbon table using Spark's datasource DDL 
syntax.
 ## Example 
 
 ```
- CREATE TABLE CARBON_TABLE (NAME STRING) USING CARBON 
OPTIONS('table_block_size'='256')
+ CREATE TABLE CARBON_TABLE (NAME STRING) USING CARBON 
OPTIONS('table_blocksize'='256')
 ```
 
 # Using DataFrame
diff --git a/docs/usecases.md b/docs/usecases.md
index 343fccd..ec07ff3 100644
--- a/docs/usecases.md
+++ b/docs/usecases.md
@@ -83,7 +83,6 @@ Apart from these, the following CarbonData configuration was 
suggested to be con
 
 | Configuration for | Parameter                               | Value  | 
Description |
 |------------------ | --------------------------------------- | ------ | 
----------- |
-| Data Loading | carbon.graph.rowset.size                | 100000 | Based on 
the size of each row, this determines the memory required during data 
loading.Higher value leads to increased memory foot print |
 | Data Loading | carbon.number.of.cores.while.loading    | 12     | More cores 
can improve data loading speed |
 | Data Loading | carbon.sort.size                        | 100000 | Number of 
records to sort at a time.More number of records configured will lead to 
increased memory foot print |
 | Data Loading | table_blocksize                         | 256  | To 
efficiently schedule multiple tasks during query |
@@ -134,7 +133,6 @@ Use all columns are no-dictionary as the cardinality is 
high.
 
 | Configuration for | Parameter                               | Value          
         | Description |
 | ------------------| --------------------------------------- | 
----------------------- | ------------------|
-| Data Loading | carbon.graph.rowset.size                | 100000              
    | Based on the size of each row, this determines the memory required during 
data loading.Higher value leads to increased memory foot print |
 | Data Loading | enable.unsafe.sort                      | TRUE                
    | Temporary data generated during sort is huge which causes GC bottlenecks. 
Using unsafe reduces the pressure on GC |
 | Data Loading | enable.offheap.sort                     | TRUE                
    | Temporary data generated during sort is huge which causes GC bottlenecks. 
Using offheap reduces the pressure on GC.offheap can be accessed through java 
unsafe.hence enable.unsafe.sort needs to be true |
 | Data Loading | offheap.sort.chunk.size.in.mb           | 128                 
    | Size of memory to allocate for sorting.Can increase this based on the 
memory available |
@@ -143,7 +141,7 @@ Use all columns are no-dictionary as the cardinality is 
high.
 | Data Loading | table_blocksize                         | 512                 
    | To efficiently schedule multiple tasks during query. This size depends on 
data scenario.If data is such that the filters would select less number of 
blocklets to scan, keeping higher number works well.If the number blocklets to 
scan is more, better to reduce the size as more tasks can be scheduled in 
parallel. |
 | Data Loading | carbon.sort.intermediate.files.limit    | 100                 
    | Increased to 100 as number of cores are more.Can perform merging in 
backgorund.If less number of files to merge, sort threads would be idle |
 | Data Loading | carbon.use.local.dir                    | TRUE                
    | yarn application directory will be usually on a single disk.YARN would be 
configured with multiple disks to be used as temp or to assign randomly to 
applications. Using the yarn temp directory will allow carbon to use multiple 
disks and improve IO performance |
-| Data Loading | sort.inmemory.size.in.mb                | 92160 | Memory 
allocated to do inmemory sorting. When more memory is available in the node, 
configuring this will retain more sort blocks in memory so that the merge sort 
is faster due to no/very less IO |
+| Data Loading | sort.inmemory.size.inmb                | 92160 | Memory 
allocated to do inmemory sorting. When more memory is available in the node, 
configuring this will retain more sort blocks in memory so that the merge sort 
is faster due to no/very less IO |
 | Compaction | carbon.major.compaction.size            | 921600                
  | Sum of several loads to combine into single segment |
 | Compaction | carbon.number.of.cores.while.compacting | 12                    
  | Higher number of cores can improve the compaction speed.Data size is 
huge.Compaction need to use more threads to speed up the process |
 | Compaction | carbon.enable.auto.load.merge           | FALSE                 
  | Doing auto minor compaction is costly process as data size is huge.Perform 
manual compaction when the cluster is less loaded |
diff --git 
"a/docs/zh_cn/CarbonData\344\270\216\345\225\206\344\270\232\345\210\227\345\255\230DB\346\200\247\350\203\275\345\257\271\346\257\224.md"
 
"b/docs/zh_cn/CarbonData\344\270\216\345\225\206\344\270\232\345\210\227\345\255\230DB\346\200\247\350\203\275\345\257\271\346\257\224.md"
index 39b69f2..ee58282 100644
--- 
"a/docs/zh_cn/CarbonData\344\270\216\345\225\206\344\270\232\345\210\227\345\255\230DB\346\200\247\350\203\275\345\257\271\346\257\224.md"
+++ 
"b/docs/zh_cn/CarbonData\344\270\216\345\225\206\344\270\232\345\210\227\345\255\230DB\346\200\247\350\203\275\345\257\271\346\257\224.md"
@@ -89,7 +89,6 @@ LIMIT 5000
 | CarbonData主要配置                   | 参数值 | 描述                                  
                       |
 | ------------------------------------ | ------ | 
------------------------------------------------------------ |
 | carbon.inmemory.record.size          | 480000 | 查询每个表需要加载到内存的总行数。            
               |
-| carbon.number.of.cores               | 4      | carbon查询过程中并行扫描的线程数。         
                  |
 | carbon.number.of.cores.while.loading | 15     | carbon数据加载过程中并行扫描的线程数。       
                |
 | carbon.sort.file.buffer.size         | 20     | 
在合并排序(读/写)操作时存储每个临时过程文件的所使用的总缓存大小。单位为MB |
 | carbon.sort.size                     | 500000 | 在数据加载操作时，每次被排序的记录数。          
             |

[carbondata] branch master updated: [CARBONDATA-3717] Fix inconsistent configs in docs

Reply via email to