[
https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574841#comment-16574841
]
Sean Owen commented on SPARK-25044:
-----------------------------------
I tried this – it's not hard – but the implementation method signature in this
case still uses Object, not ints or longs.
Actually, this functionality seems to only be used in the SQL Analyzer, and
only to figure out whether the args are primitive, and then too only to decide
if it's necessary to handle null values of that argument. I tried simply
changing the Analyzer to ignore whether the arg is primitive, and not skip the
check if it's primitive. It causes some tests to pass, but not all of them.
I might next investigate whether it's feasible to fix this by not analyzing
primitive-ness of arguments [~smilegator]
{code:java}
- SPARK-11725: correctly handle null inputs for ScalaUDF *** FAILED ***
== FAIL: Plans do not match ===
!Project [if (isnull(a#0)) null else UDF(knownotnull(a#0)) AS #0] Project
[UDF(a#0) AS #0]
+- LocalRelation <empty>, [a#0, b#0, c#0, d#0, e#0] +- LocalRelation <empty>,
[a#0, b#0, c#0, d#0, e#0] (PlanTest.scala:119){code}
> Address translation of LMF closure primitive args to Object in Scala 2.12
> -------------------------------------------------------------------------
>
> Key: SPARK-25044
> URL: https://issues.apache.org/jira/browse/SPARK-25044
> Project: Spark
> Issue Type: Sub-task
> Components: Spark Core, SQL
> Affects Versions: 2.4.0
> Reporter: Sean Owen
> Priority: Major
>
> A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891
> Fix HandleNullInputsForUDF rule":
> {code:java}
> - SPARK-24891 Fix HandleNullInputsForUDF rule *** FAILED ***
> Results do not match for query:
> ...
> == Results ==
> == Results ==
> !== Correct Answer - 3 == == Spark Answer - 3 ==
> !struct<> struct<a:bigint,b:int,c:int>
> ![0,10,null] [0,10,0]
> ![1,12,null] [1,12,1]
> ![2,14,null] [2,14,2] (QueryTest.scala:163){code}
> You can kind of get what's going on reading the test:
> {code:java}
> test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
> // assume(!ClosureCleanerSuite2.supportsLMFs)
> // This test won't test what it intends to in 2.12, as lambda metafactory
> closures
> // have arg types that are not primitive, but Object
> val udf1 = udf({(x: Int, y: Int) => x + y})
> val df = spark.range(0, 3).toDF("a")
> .withColumn("b", udf1($"a", udf1($"a", lit(10))))
> .withColumn("c", udf1($"a", lit(null)))
> val plan = spark.sessionState.executePlan(df.logicalPlan).analyzed
> comparePlans(df.logicalPlan, plan)
> checkAnswer(
> df,
> Seq(
> Row(0, 10, null),
> Row(1, 12, null),
> Row(2, 14, null)))
> }{code}
>
> It seems that the closure that is fed in as a UDF changes behavior, in a way
> that primitive-type arguments are handled differently. For example an Int
> argument, when fed 'null', acts like 0.
> I'm sure it's a difference in the LMF closure and how its types are
> understood, but not exactly sure of the cause yet.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]