[GitHub] carbondata pull request #1898: [CARBONDATA-1880] Documentation for merging s...

QiangCai Wed, 31 Jan 2018 18:42:35 -0800

Github user QiangCai commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1898#discussion_r165247013
  
    --- Diff: docs/configuration-parameters.md ---
    @@ -60,6 +60,7 @@ This section provides the details of all the 
configurations required for CarbonD
     | carbon.options.is.empty.data.bad.record | false | If false, then empty 
("" or '' or ,,) data will not be considered as bad record and vice versa. | |
     | carbon.options.bad.record.path |  | Specifies the HDFS path where bad 
records are stored. By default the value is Null. This path must to be 
configured by the user if bad record logger is enabled or bad record action 
redirect. | |
     | carbon.enable.vector.reader | true | This parameter increases the 
performance of select queries as it fetch columnar batch of size 4*1024 rows 
instead of fetching data row by row. | |
    +| carbon.task.distribution | merge_small_files | Setting this parameter 
value to *merge_small_files* will merge all the small files to a size of (128 
MB). During data loading, all the small CSV files are combined to a map task to 
reduce the number of read task. This enhances the performance. | | 
    --- End diff --
    
    1. carbon.task.distribution is only for the query, not be used by data 
loading.
    Global_Sort loading will always merge small CSV files, not require this 
configuration.
    2. better to list all values of carbon.task.distribution
    custom, block(default), blocklet, merge_small_files

---

[GitHub] carbondata pull request #1898: [CARBONDATA-1880] Documentation for merging s...

Reply via email to