jackylk commented on a change in pull request #3518: [DOC] add performance-tuning with codegen parameters support URL: https://github.com/apache/carbondata/pull/3518#discussion_r361096657
########## File path: docs/performance-tuning.md ########## @@ -173,6 +173,8 @@ | carbon.sort.temp.compressor | spark/carbonlib/carbon.properties | Data loading | Specify the name of compressor to compress the intermediate sort temporary files during sort procedure in data loading. | The optional values are 'SNAPPY','GZIP','BZIP2','LZ4','ZSTD', and empty. Specially, empty means that Carbondata will not compress the sort temp files. This parameter will be useful if you encounter disk bottleneck. | | carbon.load.skewedDataOptimization.enabled | spark/carbonlib/carbon.properties | Data loading | Whether to enable size based block allocation strategy for data loading. | When loading, carbondata will use file size based block allocation strategy for task distribution. It will make sure that all the executors process the same size of data -- It's useful if the size of your input data files varies widely, say 1MB to 1GB. | | carbon.load.min.size.enabled | spark/carbonlib/carbon.properties | Data loading | Whether to enable node minumun input data size allocation strategy for data loading.| When loading, carbondata will use node minumun input data size allocation strategy for task distribution. It will make sure the nodes load the minimum amount of data -- It's useful if the size of your input data files very small, say 1MB to 256MB,Avoid generating a large number of small files. | +| spark.sql.codegen.wholeStage | spark/conf/spark-defaults.conf | Querying | improves the execution performance of a query by collapsing a query tree into a single optimized function that eliminates virtual function calls and leverages CPU registers for intermediate data. | The whole stage CodeGen mechanism introduced by spark SQL in version 2. X causes. This configuration is recommended to be off at spark 2.1 and on at spark 2.3. Because under spark2.1 user can only use spark.sql.codegen.wholeStage to control whether to use codegen, but can not config the size of the method. In fact, this parameter should be configured to be the same as the local JDK. Under spark2.3 support spark.sql.codegen.hugeMethodLimit use can use that to config the method size. | Review comment: This is spark configuration, suggest not to add in carbon's document. Or maybe you can add a link in the bottom of this section to point to the performance tuning page of spark community ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
