[GitHub] ravipesala commented on a change in pull request #3115: Document Update

GitBox Wed, 30 Jan 2019 01:45:00 -0800

ravipesala commented on a change in pull request #3115: Document Update
URL: https://github.com/apache/carbondata/pull/3115#discussion_r252180869


 ##########
 File path: docs/configuration-parameters.md
 ##########
 @@ -61,7 +61,7 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.number.of.cores.while.loading | 2 | Number of cores to be used while 
loading data. This also determines the number of threads to be used to read the 
input files (csv) in parallel.**NOTE:** This configured value is used in every 
data loading step to parallelize the operations. Configuring a higher value can 
lead to increased early thread pre-emption by OS and there by reduce the 
overall performance. |
 | enable.unsafe.sort | true | CarbonData supports unsafe operations of Java to 
avoid GC overhead for certain operations. This configuration enables to use 
unsafe functions in CarbonData. **NOTE:** For operations like data loading, 
which generates more short lived Java objects, Java GC can be a bottle neck. 
Using unsafe can overcome the GC overhead and improve the overall performance. |
 | enable.offheap.sort | true | CarbonData supports storing data in off-heap 
memory for certain operations during data loading and query. This helps to 
avoid the Java GC and thereby improve the overall performance. This 
configuration enables using off-heap memory for sorting of data during data 
loading.**NOTE:**  ***enable.unsafe.sort*** configuration needs to be 
configured to true for using off-heap |
-| carbon.load.sort.scope | LOCAL_SORT | CarbonData can support various sorting 
options to match the balance between load and query performance. LOCAL_SORT:All 
the data given to an executor in the single load is fully sorted and written to 
carbondata files. Data loading performance is reduced a little as the entire 
data needs to be sorted in the executor. BATCH_SORT:Sorts the data in batches 
of configured size and writes to carbondata files. Data loading performance 
increases as the entire data need not be sorted. But query performance will get 
reduced due to false positives in block pruning and also due to more number of 
carbondata files written. Due to more number of carbondata files, if identified 
blocks > cluster parallelism, query performance and concurrency will get 
reduced. GLOBAL SORT:Entire data in the data load is fully sorted and written 
to carbondata files. Data loading performance would get reduced as the entire 
data needs to be sorted. But the query performance increases significantly due 
to very less false positives and concurrency is also improved. **NOTE:** when 
BATCH_SORT is configured, it is recommended to keep 
***carbon.load.batch.sort.size.inmb*** > ***carbon.blockletgroup.size.in.mb*** |
+| carbon.load.sort.scope | LOCAL_SORT | CarbonData can support various sorting 
options to match the balance between load and query performance. LOCAL_SORT:All 
the data given to an executor in the single load is fully sorted and written to 
carbondata files. Data loading performance is reduced a little as the entire 
data needs to be sorted in the executor. BATCH_SORT:Sorts the data in batches 
of configured size and writes to carbondata files. Data loading performance 
increases as the entire data need not be sorted. But query performance will get 
reduced due to false positives in block pruning and also due to more number of 
carbondata files written. Due to more number of carbondata files, if identified 
blocks > cluster parallelism, query performance and concurrency will get 
reduced. GLOBAL SORT:Entire data in the data load is fully sorted and written 
to carbondata files. Data loading performance would get reduced as the entire 
data needs to be sorted. But the query performance increases significantly due 
to very less false positives and concurrency is also improved. **NOTE 1:** This 
property will be taken into account when SORT COLUMNS are specified 
explicitely, otherwise it is always NO SORT **NOTE 2:** When BATCH_SORT is 
configured, it is recommended to keep ***carbon.load.batch.sort.size.inmb*** > 
***carbon.blockletgroup.size.in.mb***.|
 
 Review comment:
   Please correct it like when SORT_COLUMNS are specified during carbon table 
creation

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] ravipesala commented on a change in pull request #3115: Document Update

Reply via email to