Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

via GitHub Thu, 23 May 2024 09:25:07 -0700


sunchao commented on code in PR #460:
URL: https://github.com/apache/datafusion-comet/pull/460#discussion_r1611986030



##########
common/src/main/scala/org/apache/comet/CometConf.scala:
##########
@@ -131,14 +132,18 @@ object CometConf {
       .booleanConf
       .createWithDefault(false)
 
-  val COMET_COLUMNAR_SHUFFLE_ENABLED: ConfigEntry[Boolean] =
-    conf("spark.comet.columnar.shuffle.enabled")
-      .doc(
-        "Whether to enable Arrow-based columnar shuffle for Comet and Spark 
regular operators. " +
-          "If this is enabled, Comet prefers columnar shuffle than native 
shuffle. " +
-          "By default, this config is true.")
-      .booleanConf
-      .createWithDefault(true)
+  val COMET_SHUFFLE_MODE: ConfigEntry[String] = 
conf(s"$COMET_EXEC_CONFIG_PREFIX.shuffle.mode")
+    .doc(
+      "The mode of Comet shuffle. This config is only effective only if Comet 
shuffle " +
+        "is enabled. Available modes are 'native', 'jvm', and 'auto'. " +
+        "'native' is for native shuffle which has best performance in 
general." +
+        "'jvm' is for jvm-based columnar shuffle which has higher coverage 
than native shuffle." +
+        "'auto' is for Comet to choose the best shuffle mode based on the 
query plan." +
+        "By default, this config is 'jvm'.")
+    .stringConf
+    .transform(_.toLowerCase(Locale.ROOT))
+    .checkValues(Set("native", "jvm", "auto"))

Review Comment:
   👍 



##########
spark/src/test/scala/org/apache/comet/exec/CometAggregateSuite.scala:
##########
@@ -1131,9 +1129,8 @@ class CometAggregateSuite extends CometTestBase with 
AdaptiveSparkPlanHelper {
 
   test("var_pop and var_samp") {
     withSQLConf(CometConf.COMET_EXEC_SHUFFLE_ENABLED.key -> "true") {
-      Seq(true, false).foreach { cometColumnShuffleEnabled =>
-        withSQLConf(
-          CometConf.COMET_COLUMNAR_SHUFFLE_ENABLED.key -> 
cometColumnShuffleEnabled.toString) {
+      Seq("native", "jvm").foreach { cometColumnShuffleEnabled =>

Review Comment:
   nit: maybe update the variable name too 



##########
spark/src/test/scala/org/apache/comet/exec/CometExecSuite.scala:
##########
@@ -134,14 +134,14 @@ class CometExecSuite extends CometTestBase {
         .toDF("c1", "c2")
         .createOrReplaceTempView("v")
 
-      Seq(true, false).foreach { columnarShuffle =>
+      Seq("native", "jvm").foreach { columnarShuffle =>

Review Comment:
   nit: `shuffleMode`? 



##########
docs/source/user-guide/configs.md:
##########
@@ -39,6 +38,7 @@ Comet provides the following configuration settings.
 | spark.comet.exec.memoryFraction | The fraction of memory from Comet memory 
overhead that the native memory manager can use for execution. The purpose of 
this config is to set aside memory for untracked data structures, as well as 
imprecise size estimation during memory acquisition. Default value is 0.7. | 
0.7 |
 | spark.comet.exec.shuffle.codec | The codec of Comet native shuffle used to 
compress shuffle data. Only zstd is supported. | zstd |
 | spark.comet.exec.shuffle.enabled | Whether to enable Comet native shuffle. 
By default, this config is false. Note that this requires setting 
'spark.shuffle.manager' to 
'org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager'. 
'spark.shuffle.manager' must be set before starting the Spark application and 
cannot be changed during the application. | false |
+| spark.comet.exec.shuffle.mode | The mode of Comet shuffle. This config is 
only effective only if Comet shuffle is enabled. Available modes are 'native', 
'jvm', and 'auto'. 'native' is for native shuffle which has best performance in 
general.'jvm' is for jvm-based columnar shuffle which has higher coverage than 
native shuffle.'auto' is for Comet to choose the best shuffle mode based on the 
query plan.By default, this config is 'jvm'. | jvm |

Review Comment:
   also spaces before `jvm` and `auto` 



##########
docs/source/user-guide/configs.md:
##########
@@ -39,6 +38,7 @@ Comet provides the following configuration settings.
 | spark.comet.exec.memoryFraction | The fraction of memory from Comet memory 
overhead that the native memory manager can use for execution. The purpose of 
this config is to set aside memory for untracked data structures, as well as 
imprecise size estimation during memory acquisition. Default value is 0.7. | 
0.7 |
 | spark.comet.exec.shuffle.codec | The codec of Comet native shuffle used to 
compress shuffle data. Only zstd is supported. | zstd |
 | spark.comet.exec.shuffle.enabled | Whether to enable Comet native shuffle. 
By default, this config is false. Note that this requires setting 
'spark.shuffle.manager' to 
'org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager'. 
'spark.shuffle.manager' must be set before starting the Spark application and 
cannot be changed during the application. | false |
+| spark.comet.exec.shuffle.mode | The mode of Comet shuffle. This config is 
only effective only if Comet shuffle is enabled. Available modes are 'native', 
'jvm', and 'auto'. 'native' is for native shuffle which has best performance in 
general.'jvm' is for jvm-based columnar shuffle which has higher coverage than 
native shuffle.'auto' is for Comet to choose the best shuffle mode based on the 
query plan.By default, this config is 'jvm'. | jvm |

Review Comment:
   `is only effective only` -> `is only effective`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode [datafusion-comet]

Reply via email to