This is an automated email from the ASF dual-hosted git repository. yamamuro pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.0 by this push: new f8a20c4 [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration f8a20c4 is described below commit f8a20c470bf115b0834970ce02eb2ec103e0f6df Author: HyukjinKwon <gurwls...@apache.org> AuthorDate: Thu May 7 09:00:59 2020 +0900 [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration ### What changes were proposed in this pull request? This PR proposes to deprecate 'spark.sql.optimizer.metadataOnly' configuration and remove it in the future release. ### Why are the changes needed? This optimization can cause a potential correctness issue, see also SPARK-26709. Also, it seems difficult to extend the optimization. Basically you should whitelist all available functions. It costs some maintenance overhead, see also SPARK-31590. Looks we should just better let users use `SparkSessionExtensions` instead if they must use, and remove it in Spark side. ### Does this PR introduce _any_ user-facing change? Yes, setting `spark.sql.optimizer.metadataOnly` will show a deprecation warning: ```scala scala> spark.conf.unset("spark.sql.optimizer.metadataOnly") ``` ``` 20/05/06 12:57:23 WARN SQLConf: The SQL config 'spark.sql.optimizer.metadataOnly' has been deprecated in Spark v3.0 and may be removed in the future. Avoid to depend on this optimization to prevent a potential correctness issue. If you must use, use 'SparkSessionExtensions' instead to inject it as a custom rule. ``` ```scala scala> spark.conf.set("spark.sql.optimizer.metadataOnly", "true") ``` ``` 20/05/06 12:57:44 WARN SQLConf: The SQL config 'spark.sql.optimizer.metadataOnly' has been deprecated in Spark v3.0 and may be removed in the future. Avoid to depend on this optimization to prevent a potential correctness issue. If you must use, use 'SparkSessionExtensions' instead to inject it as a custom rule. ``` ### How was this patch tested? Manually tested. Closes #28459 from HyukjinKwon/SPARK-31647. Authored-by: HyukjinKwon <gurwls...@apache.org> Signed-off-by: Takeshi Yamamuro <yamam...@apache.org> (cherry picked from commit 5c5dd77d6a29b014b3fe4b4015f5c7199650a378) Signed-off-by: Takeshi Yamamuro <yamam...@apache.org> --- .../main/scala/org/apache/spark/sql/internal/SQLConf.scala | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala index 51404a2..8d673c5 100644 --- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala +++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala @@ -844,8 +844,10 @@ object SQLConf { .doc("When true, enable the metadata-only query optimization that use the table's metadata " + "to produce the partition columns instead of table scans. It applies when all the columns " + "scanned are partition columns and the query has an aggregate operator that satisfies " + - "distinct semantics. By default the optimization is disabled, since it may return " + - "incorrect results when the files are empty.") + "distinct semantics. By default the optimization is disabled, and deprecated as of Spark " + + "3.0 since it may return incorrect results when the files are empty, see also SPARK-26709." + + "It will be removed in the future releases. If you must use, use 'SparkSessionExtensions' " + + "instead to inject it as a custom rule.") .version("2.1.1") .booleanConf .createWithDefault(false) @@ -2587,7 +2589,10 @@ object SQLConf { DeprecatedConfig(ARROW_FALLBACK_ENABLED.key, "3.0", s"Use '${ARROW_PYSPARK_FALLBACK_ENABLED.key}' instead of it."), DeprecatedConfig(SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE.key, "3.0", - s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it.") + s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it."), + DeprecatedConfig(OPTIMIZER_METADATA_ONLY.key, "3.0", + "Avoid to depend on this optimization to prevent a potential correctness issue. " + + "If you must use, use 'SparkSessionExtensions' instead to inject it as a custom rule.") ) Map(configs.map { cfg => cfg.key -> cfg } : _*) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org