[carbondata] branch master updated: [CARBONDATA-3791] Updated configuration-parameters.md and removed unused configuration

kunalkapoor Wed, 06 May 2020 04:32:36 -0700

This is an automated email from the ASF dual-hosted git repository.

kunalkapoor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git



The following commit(s) were added to refs/heads/master by this push:
     new 636958e  [CARBONDATA-3791] Updated configuration-parameters.md and 
removed unused configuration
636958e is described below

commit 636958e9be090db746452b414b0309d856db7e1e
Author: Venu Reddy <[email protected]>
AuthorDate: Mon May 4 22:08:29 2020 +0530

    [CARBONDATA-3791] Updated configuration-parameters.md and removed unused 
configuration
    
    Why is this PR needed?
    Updated configuration-parameters.md and removed unused configuration
    
    What changes were proposed in this PR?
    Updated configuration-parameters.md and removed unused configuration
    
    This closes #3744
---
 .../org/apache/carbondata/core/constants/CarbonCommonConstants.java  | 5 -----
 docs/configuration-parameters.md                                     | 5 ++---
 2 files changed, 2 insertions(+), 8 deletions(-)

diff --git 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
index b5e7f0d..9d418d4 100644
--- 
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
+++ 
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
@@ -959,11 +959,6 @@ public final class CarbonCommonConstants {
   public static final String ENABLE_OFFHEAP_SORT_DEFAULT = "true";
 
   @CarbonProperty
-  public static final String ENABLE_INMEMORY_MERGE_SORT = 
"enable.inmemory.merge.sort";
-
-  public static final String ENABLE_INMEMORY_MERGE_SORT_DEFAULT = "false";
-
-  @CarbonProperty
   public static final String OFFHEAP_SORT_CHUNK_SIZE_IN_MB = 
"offheap.sort.chunk.size.inmb";
 
   public static final String OFFHEAP_SORT_CHUNK_SIZE_IN_MB_DEFAULT = "64";
diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md
index 4627cac..dc105a8 100644
--- a/docs/configuration-parameters.md
+++ b/docs/configuration-parameters.md
@@ -31,7 +31,7 @@ This section provides the details of all the configurations 
required for the Car
 
 | Property | Default Value | Description |
 
|----------------------------|-------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 [...]
-| carbon.storelocation | spark.sql.warehouse.dir property value | Location 
where CarbonData will create the store, and write the data in its custom 
format. If not specified,the path defaults to spark.sql.warehouse.dir property. 
**NOTE:** Store location should be in HDFS or S3. |
+| carbon.storelocation | spark.sql.warehouse.dir property value | Location 
where CarbonData will create the store, and write the data in its custom 
format. If not specified,the path defaults to spark.sql.warehouse.dir property. 
**NOTE:** Store location should be in one of the carbon supported filesystems. 
Like HDFS or S3. It is not recommended to use this property. |
 | carbon.ddl.base.hdfs.url | (none) | To simplify and shorten the path to be 
specified in DDL/DML commands, this property is supported. This property is 
used to configure the HDFS relative path, the path configured in 
carbon.ddl.base.hdfs.url will be appended to the HDFS path configured in 
fs.defaultFS of core-site.xml. If this path is configured, then user need not 
pass the complete path while dataload. For example: If absolute path of the csv 
file is hdfs://10.18.101.155:54310/data/cnb [...]
 | carbon.badRecords.location | (none) | CarbonData can detect the records not 
conforming to defined table schema and isolate them as bad records. This 
property is used to specify where to store such bad records. |
 | carbon.streaming.auto.handoff.enabled | true | CarbonData supports storing 
of streaming data. To have high throughput for streaming, the data is written 
in Row format which is highly optimized for write, but performs poorly for 
query. When this property is true and when the streaming data size reaches 
***carbon.streaming.segment.max.size***, CabonData will automatically convert 
the data to columnar format and optimize it for faster querying.**NOTE:** It is 
not recommended to keep the d [...]
@@ -63,7 +63,7 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.number.of.cores.while.loading | 2 | Number of cores to be used while 
loading data. This also determines the number of threads to be used to read the 
input files (csv) in parallel.**NOTE:** This configured value is used in every 
data loading step to parallelize the operations. Configuring a higher value can 
lead to increased early thread pre-emption by OS and there by reduce the 
overall performance. |
 | enable.unsafe.sort | true | CarbonData supports unsafe operations of Java to 
avoid GC overhead for certain operations. This configuration enables to use 
unsafe functions in CarbonData. **NOTE:** For operations like data loading, 
which generates more short lived Java objects, Java GC can be a bottle neck. 
Using unsafe can overcome the GC overhead and improve the overall performance. |
 | enable.offheap.sort | true | CarbonData supports storing data in off-heap 
memory for certain operations during data loading and query. This helps to 
avoid the Java GC and thereby improve the overall performance. This 
configuration enables using off-heap memory for sorting of data during data 
loading.**NOTE:**  ***enable.unsafe.sort*** configuration needs to be 
configured to true for using off-heap |
-| carbon.load.sort.scope | LOCAL_SORT | CarbonData can support various sorting 
options to match the balance between load and query performance. LOCAL_SORT:All 
the data given to an executor in the single load is fully sorted and written to 
carbondata files. Data loading performance is reduced a little as the entire 
data needs to be sorted in the executor. GLOBAL SORT:Entire data in the data 
load is fully sorted and written to carbondata files. Data loading performance 
would get reduced as [...]
+| carbon.load.sort.scope | NO_SORT [If sort columns are not specified while 
creating table] and LOCAL_SORT [If sort columns are specified] | CarbonData can 
support various sorting options to match the balance between load and query 
performance. LOCAL_SORT: All the data given to an executor in the single load 
is fully sorted and written to carbondata files. Data loading performance is 
reduced a little as the entire data needs to be sorted in the executor. GLOBAL 
SORT: Entire data in the d [...]
 | carbon.global.sort.rdd.storage.level | MEMORY_ONLY | Storage level to 
persist dataset of RDD/dataframe when loading data with 
'sort_scope'='global_sort', if user's executor has less memory, set this 
parameter to 'MEMORY_AND_DISK_SER' or other storage level to correspond to 
different environment. [See 
detail](http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence).
 |
 | carbon.load.global.sort.partitions | 0 | The number of partitions to use 
when shuffling data for global sort. Default value 0 means to use same number 
of map tasks as reduce tasks. **NOTE:** In general, it is recommended to have 
2-3 tasks per CPU core in your cluster. |
 | carbon.sort.size | 100000 | Number of records to hold in memory to sort and 
write intermediate sort temp files. **NOTE:** Memory required for data loading 
will increase if you turn this value bigger. Besides each thread will cache 
this amout of records. The number of threads is configured by 
*carbon.number.of.cores.while.loading*. |
@@ -77,7 +77,6 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.merge.sort.reader.thread | 3 | CarbonData sorts and writes data to 
intermediate files to limit the memory usage. When the intermediate files 
reaches ***carbon.sort.intermediate.files.limit***, the files will be merged in 
another thread pool. This value will control the size of the pool. Each thread 
will read the intermediate files and do merge sort and finally write the 
records to another file. **NOTE:** Refer to 
***carbon.sort.intermediate.files.limit*** for operation descripti [...]
 | carbon.merge.sort.prefetch | true | CarbonData writes every 
***carbon.sort.size*** number of records to intermediate temp files during data 
loading to ensure memory footprint is within limits. These intermediate temp 
files will have to be sorted using merge sort before writing into CarbonData 
format. This configuration enables pre fetching of data from these temp files 
in order to optimize IO and speed up data loading process. |
 | carbon.prefetch.buffersize | 1000 | When the configuration 
***carbon.merge.sort.prefetch*** is configured to true, we need to set the 
number of records that can be prefetched. This configuration is used specify 
the number of records to be prefetched.**NOTE: **Configuring more number of 
records to be prefetched increases memory footprint as more records will have 
to be kept in memory. |
-| enable.inmemory.merge.sort | false | CarbonData sorts and writes data to 
intermediate files to limit the memory usage. These intermediate files needs to 
be sorted again using merge sort before writing to the final carbondata file. 
Performing merge sort in memory would increase the sorting performance at the 
cost of increased memory footprint. This Configuration specifies to do 
in-memory merge sort or to do file based merge sort. |
 | carbon.sort.storage.inmemory.size.inmb | 512 | CarbonData writes every 
***carbon.sort.size*** number of records to intermediate temp files during data 
loading to ensure memory footprint is within limits. When 
***enable.unsafe.sort*** configuration is enabled, instead of using 
***carbon.sort.size*** which is based on rows count, size occupied in memory is 
used to determine when to flush data pages to intermediate temp files. This 
configuration determines the memory to be used for storin [...]
 | carbon.load.sortmemory.spill.percentage | 0 | During data loading, some data 
pages are kept in memory upto memory configured in 
***carbon.sort.storage.inmemory.size.inmb*** beyond which they are spilled to 
disk as intermediate temporary sort files. This configuration determines after 
what percentage data needs to be spilled to disk. **NOTE:** Without this 
configuration, when the data pages occupy upto configured memory, new data 
pages would be dumped to disk and old pages are still mai [...]
 | carbon.enable.calculate.size | true | **For Load Operation**: Enabling this 
property will let carbondata calculate the size of the carbon data file 
(.carbondata) and the carbon index file (.carbonindex) for each load and update 
the table status file. **For Describe Formatted**: Enabling this property will 
let carbondata calculate the total size of the carbon data files and the carbon 
index files for the each table and display it in describe formatted command. 
**NOTE:** This is useful t [...]

[carbondata] branch master updated: [CARBONDATA-3791] Updated configuration-parameters.md and removed unused configuration

Reply via email to