[spark] branch branch-3.0 updated: [SPARK-31010][SQL][DOC][FOLLOW-UP] Improve deprecated warning message for untyped scala udf

gurwls223 Fri, 24 Apr 2020 03:13:39 -0700

This is an automated email from the ASF dual-hosted git repository.

gurwls223 pushed a commit to branch branch-3.0
in repository https://gitbox.apache.org/repos/asf/spark.git



The following commit(s) were added to refs/heads/branch-3.0 by this push:
     new da6f398  [SPARK-31010][SQL][DOC][FOLLOW-UP] Improve deprecated warning 
message for untyped scala udf
da6f398 is described below

commit da6f398db2838aca6e0dc18866715a14b9b2aded
Author: yi.wu <[email protected]>
AuthorDate: Fri Apr 24 19:10:18 2020 +0900

    [SPARK-31010][SQL][DOC][FOLLOW-UP] Improve deprecated warning message for 
untyped scala udf
    
    ### What changes were proposed in this pull request?
    
    Give more friendly warning message/migration guide of deprecated scala udf 
to users.
    
    ### Why are the changes needed?
    
    User can not distinguish function signature between typed and untyped scala 
udf. Instead, we shall tell user what to do directly.
    
    ### Does this PR introduce any user-facing change?
    
    No, it's newly added in Spark 3.0.
    
    ### How was this patch tested?
    
    Pass Jenkins.
    
    Closes #28311 from Ngone51/update_udf_doc.
    
    Authored-by: yi.wu <[email protected]>
    Signed-off-by: HyukjinKwon <[email protected]>
    (cherry picked from commit 463c54419bf663615eb72e24b82e940feb85c68c)
    Signed-off-by: HyukjinKwon <[email protected]>
---
 docs/sql-migration-guide.md                                  | 2 +-
 sql/core/src/main/scala/org/apache/spark/sql/functions.scala | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/docs/sql-migration-guide.md b/docs/sql-migration-guide.md
index 854c9ea..39619f6 100644
--- a/docs/sql-migration-guide.md
+++ b/docs/sql-migration-guide.md
@@ -75,7 +75,7 @@ license: |
 
   - In Spark version 2.4 and below, you can create a map with duplicated keys 
via built-in functions like `CreateMap`, `StringToMap`, etc. The behavior of 
map with duplicated keys is undefined, for example, map look up respects the 
duplicated key appears first, `Dataset.collect` only keeps the duplicated key 
appears last, `MapKeys` returns duplicated keys, etc. In Spark 3.0, Spark 
throws `RuntimeException` when duplicated keys are found. You can set 
`spark.sql.mapKeyDedupPolicy` to `LAST [...]
 
-  - In Spark 3.0, using `org.apache.spark.sql.functions.udf(AnyRef, DataType)` 
is not allowed by default. Set `spark.sql.legacy.allowUntypedScalaUDF` to true 
to keep using it. In Spark version 2.4 and below, if 
`org.apache.spark.sql.functions.udf(AnyRef, DataType)` gets a Scala closure 
with primitive-type argument, the returned UDF returns null if the input values 
is null. However, in Spark 3.0, the UDF returns the default value of the Java 
type if the input value is null. For example, ` [...]
+  - In Spark 3.0, using `org.apache.spark.sql.functions.udf(AnyRef, DataType)` 
is not allowed by default. Remove the return type parameter to automatically 
switch to typed Scala udf is recommended, or set 
`spark.sql.legacy.allowUntypedScalaUDF` to true to keep using it. In Spark 
version 2.4 and below, if `org.apache.spark.sql.functions.udf(AnyRef, 
DataType)` gets a Scala closure with primitive-type argument, the returned UDF 
returns null if the input values is null. However, in Spark 3.0 [...]
 
   - In Spark 3.0, a higher-order function `exists` follows the three-valued 
boolean logic, that is, if the `predicate` returns any `null`s and no `true` is 
obtained, then `exists` returns `null` instead of `false`. For example, 
`exists(array(1, null, 3), x -> x % 2 == 0)` is `null`. The previous 
behaviorcan be restored by setting 
`spark.sql.legacy.followThreeValuedLogicInArrayExists` to `false`.
 
diff --git a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala 
b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
index 782be98..9fd4718 100644
--- a/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
+++ b/sql/core/src/main/scala/org/apache/spark/sql/functions.scala
@@ -4833,8 +4833,8 @@ object functions {
    * @group udf_funcs
    * @since 2.0.0
    */
-  @deprecated("Untyped Scala UDF API is deprecated, please use typed Scala UDF 
API such as " +
-    "'def udf[RT: TypeTag](f: Function0[RT]): UserDefinedFunction' instead.", 
"3.0.0")
+  @deprecated("Scala `udf` method with return type parameter is deprecated. " +
+    "Please use Scala `udf` method without return type parameter.", "3.0.0")
   def udf(f: AnyRef, dataType: DataType): UserDefinedFunction = {
     if (!SQLConf.get.getConf(SQLConf.LEGACY_ALLOW_UNTYPED_SCALA_UDF)) {
       val errorMsg = "You're using untyped Scala UDF, which does not have the 
input type " +
@@ -4842,7 +4842,7 @@ object functions {
         "argument, and the closure will see the default value of the Java type 
for the null " +
         "argument, e.g. `udf((x: Int) => x, IntegerType)`, the result is 0 for 
null input. " +
         "To get rid of this error, you could:\n" +
-        "1. use typed Scala UDF APIs, e.g. `udf((x: Int) => x)`\n" +
+        "1. use typed Scala UDF APIs(without return type parameter), e.g. 
`udf((x: Int) => x)`\n" +
         "2. use Java UDF APIs, e.g. `udf(new UDF1[String, Integer] { " +
         "override def call(s: String): Integer = s.length() }, IntegerType)`, 
" +
         "if input types are all non primitive\n" +


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[spark] branch branch-3.0 updated: [SPARK-31010][SQL][DOC][FOLLOW-UP] Improve deprecated warning message for untyped scala udf

Reply via email to