[jira] [Updated] (SPARK-25690) Analyzer rule "HandleNullInputsForUDF" does not stabilize and can be applied infinitely

Maryann Xue (JIRA) Tue, 09 Oct 2018 12:38:29 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-25690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Maryann Xue updated SPARK-25690:
--------------------------------
    Description: 
This was fixed in SPARK-24891 and was then broken by SPARK-25044.

The tests added in SPARK-24891 were not good enough and the expected failures 
were shadowed by SPARK-24865. For more details, please refer to SPARK-25650. 
Code changes and tests in 
[https://github.com/apache/spark/pull/22060/files#diff-f70523b948b7af21abddfa3ab7e1d7d6R72]
 can help reproduce the issue.

  was:
A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891 Fix 
HandleNullInputsForUDF rule":
{code:java}
- SPARK-24891 Fix HandleNullInputsForUDF rule *** FAILED ***
Results do not match for query:
...
== Results ==

== Results ==
!== Correct Answer - 3 == == Spark Answer - 3 ==
!struct<> struct<a:bigint,b:int,c:int>
![0,10,null] [0,10,0]
![1,12,null] [1,12,1]
![2,14,null] [2,14,2] (QueryTest.scala:163){code}
You can kind of get what's going on reading the test:
{code:java}
test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
// assume(!ClosureCleanerSuite2.supportsLMFs)
// This test won't test what it intends to in 2.12, as lambda metafactory 
closures
// have arg types that are not primitive, but Object
val udf1 = udf({(x: Int, y: Int) => x + y})
val df = spark.range(0, 3).toDF("a")
.withColumn("b", udf1($"a", udf1($"a", lit(10))))
.withColumn("c", udf1($"a", lit(null)))
val plan = spark.sessionState.executePlan(df.logicalPlan).analyzed

comparePlans(df.logicalPlan, plan)
checkAnswer(
df,
Seq(
Row(0, 10, null),
Row(1, 12, null),
Row(2, 14, null)))
}{code}
 

It seems that the closure that is fed in as a UDF changes behavior, in a way 
that primitive-type arguments are handled differently. For example an Int 
argument, when fed 'null', acts like 0.

I'm sure it's a difference in the LMF closure and how its types are understood, 
but not exactly sure of the cause yet.


> Analyzer rule "HandleNullInputsForUDF" does not stabilize and can be applied 
> infinitely
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-25690
>                 URL: https://issues.apache.org/jira/browse/SPARK-25690
>             Project: Spark
>          Issue Type: Sub-task
>          Components: Spark Core, SQL
>    Affects Versions: 2.4.0
>            Reporter: Maryann Xue
>            Assignee: Sean Owen
>            Priority: Major
>             Fix For: 2.4.0
>
>
> This was fixed in SPARK-24891 and was then broken by SPARK-25044.
> The tests added in SPARK-24891 were not good enough and the expected failures 
> were shadowed by SPARK-24865. For more details, please refer to SPARK-25650. 
> Code changes and tests in 
> [https://github.com/apache/spark/pull/22060/files#diff-f70523b948b7af21abddfa3ab7e1d7d6R72]
>  can help reproduce the issue.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-25690) Analyzer rule "HandleNullInputsForUDF" does not stabilize and can be applied infinitely

Reply via email to