Github user sraghunandan commented on a diff in the pull request:
https://github.com/apache/carbondata/pull/2805#discussion_r226926054
--- Diff: docs/configuration-parameters.md ---
@@ -75,7 +75,7 @@ This section provides the details of all the
configurations required for the Car
| carbon.use.multiple.temp.dir | false | When multiple disks are present
in the system, YARN is generally configured with multiple disks to be used as
temp directories for managing the containers. This configuration specifies
whether to use multiple YARN local directories during data loading for disk IO
load balancing.Enable ***carbon.use.local.dir*** for this configuration to take
effect. **NOTE:** Data Loading is an IO intensive operation whose performance
can be limited by the disk IO threshold, particularly during multi table
concurrent data load.Configuring this parameter, balances the disk IO across
multiple disks there by improving the over all load performance. |
| carbon.sort.temp.compressor | (none) | CarbonData writes every
***carbon.sort.size*** number of records to intermediate temp files during data
loading to ensure memory footprint is within limits. These temporary files can
be compressed and written in order to save the storage space. This
configuration specifies the name of compressor to be used to compress the
intermediate sort temp files during sort procedure in data loading. The valid
values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD' and empty. By default, empty
means that Carbondata will not compress the sort temp files. **NOTE:**
Compressor will be useful if you encounter disk bottleneck.Since the data needs
to be compressed and decompressed,it involves additional CPU cycles,but is
compensated by the high IO throughput due to less data to be written or read
from the disks. |
| carbon.load.skewedDataOptimization.enabled | false | During data
loading,CarbonData would divide the number of blocks equally so as to ensure
all executors process same number of blocks. This mechanism satisfies most of
the scenarios and ensures maximum parallel processing for optimal data loading
performance.In some business scenarios, there might be scenarios where the size
of blocks vary significantly and hence some executors would have to do more
work if they get blocks containing more data. This configuration enables size
based block allocation strategy for data loading. When loading, carbondata will
use file size based block allocation strategy for task distribution. It will
make sure that all the executors process the same size of data.**NOTE:** This
configuration is useful if the size of your input data files varies widely, say
1MB to 1GB.For this configuration to work effectively,knowing the data pattern
and size is important and necessary. |
-| carbon.load.min.size.enabled | false | During Data Loading, CarbonData
would divide the number of files among the available executors to parallelize
the loading operation. When the input data files are very small, this action
causes to generate many small carbondata files. This configuration determines
whether to enable node minumun input data size allocation strategy for data
loading.It will make sure that the node load the minimum amount of data there
by reducing number of carbondata files.**NOTE:** This configuration is useful
if the size of the input data files are very small, like 1MB to 256MB.Refer to
***load_min_size_inmb*** to configure the minimum size to be considered for
splitting files among executors. |
+| carbon.load.min.size.enabled | false | During Data Loading, CarbonData
would divide the number of files among the available executors to parallelize
the loading operation. When the input data files are very small, this action
causes to generate many small carbondata files. This configuration determines
whether to enable node minumun input data size allocation strategy for data
loading. It will make sure that the nodes load the minimum amount of data there
by reducing number of carbondata files.**NOTE:** This configuration is useful
if the size of the input data files are very small, like 1MB to 256MB.Refer to
***load_min_size_inmb*** to configure the minimum size to be considered for
splitting files among executors. |
--- End diff --
add space after full stops
---