comphead commented on code in PR #1674:
URL: https://github.com/apache/datafusion-comet/pull/1674#discussion_r2054980814


##########
docs/source/user-guide/configs.md:
##########
@@ -88,4 +88,5 @@ Comet provides the following configuration settings.
 | spark.comet.scan.preFetch.enabled | Whether to enable pre-fetching feature 
of CometScan. | false |
 | spark.comet.scan.preFetch.threadNum | The number of threads running 
pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is 
enabled. Note that more pre-fetching threads means more memory requirement to 
store pre-fetched row groups. | 2 |
 | spark.comet.shuffle.preferDictionary.ratio | The ratio of total values to 
distinct values in a string column to decide whether to prefer dictionary 
encoding when shuffling the column. If the ratio is higher than this config, 
dictionary encoding will be used on shuffling string column. This config is 
effective if it is higher than 1.0. Note that this config is only used when 
`spark.comet.exec.shuffle.mode` is `jvm`. | 10.0 |
+| spark.comet.shuffle.sizeInBytesMultiplier | Comet produces smaller shuffle 
files due to columnar compression and this can result in Spark choosing a 
different join strategy due to the estimated size of the exchange being 
smaller. Comet will multiple sizeInBytes by this amount to avoid regressions in 
join strategy. | 2.0 |

Review Comment:
   ```suggestion
   | spark.comet.shuffle.sizeInBytesMultiplier | Comet produces smaller shuffle 
files due to columnar compression and this can result in Spark choosing a 
different join strategy due to the estimated size of the exchange being 
smaller. Comet will multiple sizeInBytes by this amount to avoid regressions in 
join strategy. | 1.0 |
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to