[GitHub] carbondata pull request #2604: [CARBONDATA-2815][Doc] Add documentation for ...

xuchuanyin Thu, 02 Aug 2018 08:12:45 -0700

Github user xuchuanyin commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/2604#discussion_r207265094
  
    --- Diff: docs/configuration-parameters.md ---
    @@ -69,7 +69,8 @@ This section provides the details of all the 
configurations required for CarbonD
     | carbon.options.bad.record.path |  | Specifies the HDFS path where bad 
records are stored. By default the value is Null. This path must to be 
configured by the user if bad record logger is enabled or bad record action 
redirect. | |
     | carbon.enable.vector.reader | true | This parameter increases the 
performance of select queries as it fetch columnar batch of size 4*1024 rows 
instead of fetching data row by row. | |
     | carbon.blockletgroup.size.in.mb | 64 MB | The data are read as a group 
of blocklets which are called blocklet groups. This parameter specifies the 
size of the blocklet group. Higher value results in better sequential IO 
access.The minimum value is 16MB, any value lesser than 16MB will reset to the 
default value (64MB). |  |
    -| carbon.task.distribution | block | **block**: Setting this value will 
launch one task per block. This setting is suggested in case of concurrent 
queries and queries having big shuffling scenarios. **custom**: Setting this 
value will group the blocks and distribute it uniformly to the available 
resources in the cluster. This enhances the query performance but not suggested 
in case of concurrent queries and queries having big shuffling scenarios. 
**blocklet**: Setting this value will launch one task per blocklet. This 
setting is suggested in case of concurrent queries and queries having big 
shuffling scenarios. **merge_small_files**: Setting this value will merge all 
the small partitions to a size of (128 MB is the default value of 
"spark.sql.files.maxPartitionBytes",it is configurable) during querying. The 
small partitions are combined to a map task to reduce the number of read task. 
This enhances the performance. | | 
    +| carbon.task.distribution | block | **block**: Setting this value will 
launch one task per block. This setting is suggested in case of concurrent 
queries and queries having big shuffling scenarios. **custom**: Setting this 
value will group the blocks and distribute it uniformly to the available 
resources in the cluster. This enhances the query performance but not suggested 
in case of concurrent queries and queries having big shuffling scenarios. 
**blocklet**: Setting this value will launch one task per blocklet. This 
setting is suggested in case of concurrent queries and queries having big 
shuffling scenarios. **merge_small_files**: Setting this value will merge all 
the small partitions to a size of (128 MB is the default value of 
"spark.sql.files.maxPartitionBytes",it is configurable) during querying. The 
small partitions are combined to a map task to reduce the number of read task. 
This enhances the performance. | |
    +| carbon.load.sortmemory.spill.percentage | 0 | If we use unsafe memory 
during data loading, this configuration will be used to control the behavior of 
spilling inmemory pages to disk. Internally in Carbondata, during sorting 
carbondata will sort data in pages and add them in unsafe memory. If the memory 
insufficient, carbondata will spill the pages to disk and generate sort temp 
file. This configuration controls how many pages in memory will be spilled to 
disk based size. The size can be calculated by multiply this configuration 
value with 'carbon.sort.storage.inmemory.size.inmb'. For example, default value 
0 means that no pages in unsafe memory will be spilled and all the newly sorted 
data will be spilled to disk; Value 50 means that if the unsafe memory is 
insufficient, about half of pages in the unsafe memory will be spilled to disk 
while value 100 means that almost all pages in unsafe memory will be spilled. 
**Note**: This configuration only works for 'LOCAL_SORT' and 'BATC
 H_SORT' and the actual spilling behavior may slightly be different in each 
data loading. | Integer values between 0 and 100 |
    --- End diff --
    
    fixed

---

[GitHub] carbondata pull request #2604: [CARBONDATA-2815][Doc] Add documentation for ...

Reply via email to