This is an automated email from the ASF dual-hosted git repository.
kunalkapoor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/carbondata.git
The following commit(s) were added to refs/heads/master by this push:
new 636958e [CARBONDATA-3791] Updated configuration-parameters.md and
removed unused configuration
636958e is described below
commit 636958e9be090db746452b414b0309d856db7e1e
Author: Venu Reddy <[email protected]>
AuthorDate: Mon May 4 22:08:29 2020 +0530
[CARBONDATA-3791] Updated configuration-parameters.md and removed unused
configuration
Why is this PR needed?
Updated configuration-parameters.md and removed unused configuration
What changes were proposed in this PR?
Updated configuration-parameters.md and removed unused configuration
This closes #3744
---
.../org/apache/carbondata/core/constants/CarbonCommonConstants.java | 5 -----
docs/configuration-parameters.md | 5 ++---
2 files changed, 2 insertions(+), 8 deletions(-)
diff --git
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
index b5e7f0d..9d418d4 100644
---
a/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
+++
b/core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java
@@ -959,11 +959,6 @@ public final class CarbonCommonConstants {
public static final String ENABLE_OFFHEAP_SORT_DEFAULT = "true";
@CarbonProperty
- public static final String ENABLE_INMEMORY_MERGE_SORT =
"enable.inmemory.merge.sort";
-
- public static final String ENABLE_INMEMORY_MERGE_SORT_DEFAULT = "false";
-
- @CarbonProperty
public static final String OFFHEAP_SORT_CHUNK_SIZE_IN_MB =
"offheap.sort.chunk.size.inmb";
public static final String OFFHEAP_SORT_CHUNK_SIZE_IN_MB_DEFAULT = "64";
diff --git a/docs/configuration-parameters.md b/docs/configuration-parameters.md
index 4627cac..dc105a8 100644
--- a/docs/configuration-parameters.md
+++ b/docs/configuration-parameters.md
@@ -31,7 +31,7 @@ This section provides the details of all the configurations
required for the Car
| Property | Default Value | Description |
|----------------------------|-------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[...]
-| carbon.storelocation | spark.sql.warehouse.dir property value | Location
where CarbonData will create the store, and write the data in its custom
format. If not specified,the path defaults to spark.sql.warehouse.dir property.
**NOTE:** Store location should be in HDFS or S3. |
+| carbon.storelocation | spark.sql.warehouse.dir property value | Location
where CarbonData will create the store, and write the data in its custom
format. If not specified,the path defaults to spark.sql.warehouse.dir property.
**NOTE:** Store location should be in one of the carbon supported filesystems.
Like HDFS or S3. It is not recommended to use this property. |
| carbon.ddl.base.hdfs.url | (none) | To simplify and shorten the path to be
specified in DDL/DML commands, this property is supported. This property is
used to configure the HDFS relative path, the path configured in
carbon.ddl.base.hdfs.url will be appended to the HDFS path configured in
fs.defaultFS of core-site.xml. If this path is configured, then user need not
pass the complete path while dataload. For example: If absolute path of the csv
file is hdfs://10.18.101.155:54310/data/cnb [...]
| carbon.badRecords.location | (none) | CarbonData can detect the records not
conforming to defined table schema and isolate them as bad records. This
property is used to specify where to store such bad records. |
| carbon.streaming.auto.handoff.enabled | true | CarbonData supports storing
of streaming data. To have high throughput for streaming, the data is written
in Row format which is highly optimized for write, but performs poorly for
query. When this property is true and when the streaming data size reaches
***carbon.streaming.segment.max.size***, CabonData will automatically convert
the data to columnar format and optimize it for faster querying.**NOTE:** It is
not recommended to keep the d [...]
@@ -63,7 +63,7 @@ This section provides the details of all the configurations
required for the Car
| carbon.number.of.cores.while.loading | 2 | Number of cores to be used while
loading data. This also determines the number of threads to be used to read the
input files (csv) in parallel.**NOTE:** This configured value is used in every
data loading step to parallelize the operations. Configuring a higher value can
lead to increased early thread pre-emption by OS and there by reduce the
overall performance. |
| enable.unsafe.sort | true | CarbonData supports unsafe operations of Java to
avoid GC overhead for certain operations. This configuration enables to use
unsafe functions in CarbonData. **NOTE:** For operations like data loading,
which generates more short lived Java objects, Java GC can be a bottle neck.
Using unsafe can overcome the GC overhead and improve the overall performance. |
| enable.offheap.sort | true | CarbonData supports storing data in off-heap
memory for certain operations during data loading and query. This helps to
avoid the Java GC and thereby improve the overall performance. This
configuration enables using off-heap memory for sorting of data during data
loading.**NOTE:** ***enable.unsafe.sort*** configuration needs to be
configured to true for using off-heap |
-| carbon.load.sort.scope | LOCAL_SORT | CarbonData can support various sorting
options to match the balance between load and query performance. LOCAL_SORT:All
the data given to an executor in the single load is fully sorted and written to
carbondata files. Data loading performance is reduced a little as the entire
data needs to be sorted in the executor. GLOBAL SORT:Entire data in the data
load is fully sorted and written to carbondata files. Data loading performance
would get reduced as [...]
+| carbon.load.sort.scope | NO_SORT [If sort columns are not specified while
creating table] and LOCAL_SORT [If sort columns are specified] | CarbonData can
support various sorting options to match the balance between load and query
performance. LOCAL_SORT: All the data given to an executor in the single load
is fully sorted and written to carbondata files. Data loading performance is
reduced a little as the entire data needs to be sorted in the executor. GLOBAL
SORT: Entire data in the d [...]
| carbon.global.sort.rdd.storage.level | MEMORY_ONLY | Storage level to
persist dataset of RDD/dataframe when loading data with
'sort_scope'='global_sort', if user's executor has less memory, set this
parameter to 'MEMORY_AND_DISK_SER' or other storage level to correspond to
different environment. [See
detail](http://spark.apache.org/docs/latest/rdd-programming-guide.html#rdd-persistence).
|
| carbon.load.global.sort.partitions | 0 | The number of partitions to use
when shuffling data for global sort. Default value 0 means to use same number
of map tasks as reduce tasks. **NOTE:** In general, it is recommended to have
2-3 tasks per CPU core in your cluster. |
| carbon.sort.size | 100000 | Number of records to hold in memory to sort and
write intermediate sort temp files. **NOTE:** Memory required for data loading
will increase if you turn this value bigger. Besides each thread will cache
this amout of records. The number of threads is configured by
*carbon.number.of.cores.while.loading*. |
@@ -77,7 +77,6 @@ This section provides the details of all the configurations
required for the Car
| carbon.merge.sort.reader.thread | 3 | CarbonData sorts and writes data to
intermediate files to limit the memory usage. When the intermediate files
reaches ***carbon.sort.intermediate.files.limit***, the files will be merged in
another thread pool. This value will control the size of the pool. Each thread
will read the intermediate files and do merge sort and finally write the
records to another file. **NOTE:** Refer to
***carbon.sort.intermediate.files.limit*** for operation descripti [...]
| carbon.merge.sort.prefetch | true | CarbonData writes every
***carbon.sort.size*** number of records to intermediate temp files during data
loading to ensure memory footprint is within limits. These intermediate temp
files will have to be sorted using merge sort before writing into CarbonData
format. This configuration enables pre fetching of data from these temp files
in order to optimize IO and speed up data loading process. |
| carbon.prefetch.buffersize | 1000 | When the configuration
***carbon.merge.sort.prefetch*** is configured to true, we need to set the
number of records that can be prefetched. This configuration is used specify
the number of records to be prefetched.**NOTE: **Configuring more number of
records to be prefetched increases memory footprint as more records will have
to be kept in memory. |
-| enable.inmemory.merge.sort | false | CarbonData sorts and writes data to
intermediate files to limit the memory usage. These intermediate files needs to
be sorted again using merge sort before writing to the final carbondata file.
Performing merge sort in memory would increase the sorting performance at the
cost of increased memory footprint. This Configuration specifies to do
in-memory merge sort or to do file based merge sort. |
| carbon.sort.storage.inmemory.size.inmb | 512 | CarbonData writes every
***carbon.sort.size*** number of records to intermediate temp files during data
loading to ensure memory footprint is within limits. When
***enable.unsafe.sort*** configuration is enabled, instead of using
***carbon.sort.size*** which is based on rows count, size occupied in memory is
used to determine when to flush data pages to intermediate temp files. This
configuration determines the memory to be used for storin [...]
| carbon.load.sortmemory.spill.percentage | 0 | During data loading, some data
pages are kept in memory upto memory configured in
***carbon.sort.storage.inmemory.size.inmb*** beyond which they are spilled to
disk as intermediate temporary sort files. This configuration determines after
what percentage data needs to be spilled to disk. **NOTE:** Without this
configuration, when the data pages occupy upto configured memory, new data
pages would be dumped to disk and old pages are still mai [...]
| carbon.enable.calculate.size | true | **For Load Operation**: Enabling this
property will let carbondata calculate the size of the carbon data file
(.carbondata) and the carbon index file (.carbonindex) for each load and update
the table status file. **For Describe Formatted**: Enabling this property will
let carbondata calculate the total size of the carbon data files and the carbon
index files for the each table and display it in describe formatted command.
**NOTE:** This is useful t [...]