[GitHub] [carbondata] jackylk commented on a change in pull request #3518: [DOC] add performance-tuning with codegen parameters support

GitBox Tue, 24 Dec 2019 00:05:52 -0800

jackylk commented on a change in pull request #3518: [DOC] add 
performance-tuning with codegen parameters support
URL: https://github.com/apache/carbondata/pull/3518#discussion_r361096657


 ##########
 File path: docs/performance-tuning.md
 ##########
 @@ -173,6 +173,8 @@
 | carbon.sort.temp.compressor | spark/carbonlib/carbon.properties | Data 
loading | Specify the name of compressor to compress the intermediate sort 
temporary files during sort procedure in data loading. | The optional values 
are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD', and empty. Specially, empty means 
that Carbondata will not compress the sort temp files. This parameter will be 
useful if you encounter disk bottleneck. |
 | carbon.load.skewedDataOptimization.enabled | 
spark/carbonlib/carbon.properties | Data loading | Whether to enable size based 
block allocation strategy for data loading. | When loading, carbondata will use 
file size based block allocation strategy for task distribution. It will make 
sure that all the executors process the same size of data -- It's useful if the 
size of your input data files varies widely, say 1MB to 1GB. |
 | carbon.load.min.size.enabled | spark/carbonlib/carbon.properties | Data 
loading | Whether to enable node minumun input data size allocation strategy 
for data loading.| When loading, carbondata will use node minumun input data 
size allocation strategy for task distribution. It will make sure the nodes 
load the minimum amount of data -- It's useful if the size of your input data 
files very small, say 1MB to 256MB,Avoid generating a large number of small 
files. |
+| spark.sql.codegen.wholeStage | spark/conf/spark-defaults.conf | Querying | 
improves the execution performance of a query by collapsing a query tree into a 
single optimized function that eliminates virtual function calls and leverages 
CPU registers for intermediate data. | The whole stage CodeGen mechanism 
introduced by spark SQL in version 2. X causes. This configuration is 
recommended to be off at spark 2.1 and on at spark 2.3. Because under spark2.1 
user can only use spark.sql.codegen.wholeStage to control whether to use 
codegen, but can not config the  size of the method. In fact, this parameter 
should be configured to be the same as the local JDK. Under spark2.3 support 
spark.sql.codegen.hugeMethodLimit  use can use that to config the method size. |
 
 Review comment:
   This is spark configuration, suggest not to add in carbon's document. Or 
maybe you can add a link in the bottom of this section to point to the 
performance tuning page of spark community

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [carbondata] jackylk commented on a change in pull request #3518: [DOC] add performance-tuning with codegen parameters support

Reply via email to