[spark] branch branch-3.0 updated: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

yamamuro Wed, 06 May 2020 17:07:25 -0700

This is an automated email from the ASF dual-hosted git repository.

yamamuro pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new f8a20c4  [SPARK-31647][SQL] Deprecate 
'spark.sql.optimizer.metadataOnly' configuration
f8a20c4 is described below

commit f8a20c470bf115b0834970ce02eb2ec103e0f6df
Author: HyukjinKwon <gurwls...@apache.org>
AuthorDate: Thu May 7 09:00:59 2020 +0900

    [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' 
configuration
    
    ### What changes were proposed in this pull request?
    
    This PR proposes to deprecate 'spark.sql.optimizer.metadataOnly' 
configuration and remove it in the future release.
    
    ### Why are the changes needed?
    
    This optimization can cause a potential correctness issue, see also 
SPARK-26709.
    Also, it seems difficult to extend the optimization. Basically you should 
whitelist all available functions. It costs some maintenance overhead, see also 
SPARK-31590.
    
    Looks we should just better let users use `SparkSessionExtensions` instead 
if they must use, and remove it in Spark side.
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, setting `spark.sql.optimizer.metadataOnly` will show a deprecation 
warning:
    
    ```scala
    scala> spark.conf.unset("spark.sql.optimizer.metadataOnly")
    ```
    ```
    20/05/06 12:57:23 WARN SQLConf: The SQL config 
'spark.sql.optimizer.metadataOnly' has been
     deprecated in Spark v3.0 and may be removed in the future. Avoid to depend 
on this optimization
     to prevent a potential correctness issue. If you must use, use 
'SparkSessionExtensions' instead to
    inject it as a custom rule.
    ```
    ```scala
    scala> spark.conf.set("spark.sql.optimizer.metadataOnly", "true")
    ```
    ```
    20/05/06 12:57:44 WARN SQLConf: The SQL config 
'spark.sql.optimizer.metadataOnly' has been
    deprecated in Spark v3.0 and may be removed in the future. Avoid to depend 
on this optimization
     to prevent a potential correctness issue. If you must use, use 
'SparkSessionExtensions' instead to
    inject it as a custom rule.
    ```
    
    ### How was this patch tested?
    
    Manually tested.
    
    Closes #28459 from HyukjinKwon/SPARK-31647.
    
    Authored-by: HyukjinKwon <gurwls...@apache.org>
    Signed-off-by: Takeshi Yamamuro <yamam...@apache.org>
    (cherry picked from commit 5c5dd77d6a29b014b3fe4b4015f5c7199650a378)
    Signed-off-by: Takeshi Yamamuro <yamam...@apache.org>
---
 .../main/scala/org/apache/spark/sql/internal/SQLConf.scala    | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git 
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala 
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 51404a2..8d673c5 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -844,8 +844,10 @@ object SQLConf {
     .doc("When true, enable the metadata-only query optimization that use the 
table's metadata " +
       "to produce the partition columns instead of table scans. It applies 
when all the columns " +
       "scanned are partition columns and the query has an aggregate operator 
that satisfies " +
-      "distinct semantics. By default the optimization is disabled, since it 
may return " +
-      "incorrect results when the files are empty.")
+      "distinct semantics. By default the optimization is disabled, and 
deprecated as of Spark " +
+      "3.0 since it may return incorrect results when the files are empty, see 
also SPARK-26709." +
+      "It will be removed in the future releases. If you must use, use 
'SparkSessionExtensions' " +
+      "instead to inject it as a custom rule.")
     .version("2.1.1")
     .booleanConf
     .createWithDefault(false)
@@ -2587,7 +2589,10 @@ object SQLConf {
       DeprecatedConfig(ARROW_FALLBACK_ENABLED.key, "3.0",
         s"Use '${ARROW_PYSPARK_FALLBACK_ENABLED.key}' instead of it."),
       DeprecatedConfig(SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE.key, "3.0",
-        s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it.")
+        s"Use '${ADVISORY_PARTITION_SIZE_IN_BYTES.key}' instead of it."),
+      DeprecatedConfig(OPTIMIZER_METADATA_ONLY.key, "3.0",
+        "Avoid to depend on this optimization to prevent a potential 
correctness issue. " +
+          "If you must use, use 'SparkSessionExtensions' instead to inject it 
as a custom rule.")
     )
 
     Map(configs.map { cfg => cfg.key -> cfg } : _*)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

[spark] branch branch-3.0 updated: [SPARK-31647][SQL] Deprecate 'spark.sql.optimizer.metadataOnly' configuration

Reply via email to