andygrove commented on code in PR #832: URL: https://github.com/apache/datafusion-comet/pull/832#discussion_r1718733137
########## common/src/main/scala/org/apache/comet/CometConf.scala: ########## @@ -84,15 +84,33 @@ object CometConf extends ShimCometConf { .booleanConf .createWithDefault(sys.env.getOrElse("ENABLE_COMET", "true").toBoolean) - val COMET_SCAN_ENABLED: ConfigEntry[Boolean] = conf("spark.comet.scan.enabled") + val COMET_NATIVE_SCAN_ENABLED: ConfigEntry[Boolean] = conf("spark.comet.scan.enabled") .doc( - "Whether to enable Comet scan. When this is turned on, Spark will use Comet to read " + - "Parquet data source. Note that to enable native vectorized execution, both this " + - "config and 'spark.comet.exec.enabled' need to be enabled. By default, this config " + - "is true.") + "Whether to enable native scans. When this is turned on, Spark will use Comet to " + + "read supported data sources (currently only Parquet is supported natively). Note " + + "that to enable native vectorized execution, both this config and " + + "'spark.comet.exec.enabled' need to be enabled. By default, this config is true.") .booleanConf .createWithDefault(true) + val COMET_CONVERT_FROM_PARQUET_ENABLED: ConfigEntry[Boolean] = + conf("spark.comet.convert.parquet.enabled") + .doc( + "When enabled, data from Parquet v1 and v2 scans will be converted to Arrow format. Note " + + "that to enable native vectorized execution, both this config and " + + "'spark.comet.exec.enabled' need to be enabled.") + .booleanConf + .createWithDefault(false) Review Comment: I added this in the docs: ``` ## Parquet When `spark.comet.scan.enabled` is enabled, Parquet scans will be performed natively by Comet if all data types in the schema are supported. When this option is not enabled, the scan will fall back to Spark. In this case, enabling `spark.comet.convert.parquet.enabled` will immediately convert the data into Arrow format, allowing native execution to happen after that, but the process may not be efficient. ``` I'll take another pass at the config description though to make it more detailed -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org