[
https://issues.apache.org/jira/browse/SPARK-25044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16573400#comment-16573400
]
Sean Owen commented on SPARK-25044:
-----------------------------------
I dug into this, and found that you can't directly get the class where the
function is declared with getClass.getEnclosingClass. However that can be
worked out anyway from the name of the class, which is something like
"EnclosingClass$$Lambda...". From there it's possible to find the
$anonfun$new$xx methods, but, as far as I can tell, not possible to determine
which one the function delegates to.
I suppose it's worth asking the actual question: is there in general a way to
get the parameter and return types of a FunctionN object in general, when some
are primitive? That is, I see that the types are erased and always have been,
so not possible to recover that a type was String or something. But would there
be anyway to get the types in the examples you give above? it sounds like "no"
in both the specialized and unspecialized cases in Scala 2.12, but just
checking whether there's any easy answer or reason it's impossible.
Adding [~cloud_fan] as this centers around a method like
ScalaReflection.getParameterTypes
The issue with SQL here as far as I can tell is not so much knowing the types,
but knowing when to treat null values specially. Above, the test failure is
basically that the column c = a + null should always come out null, but acts
like it's implemented as c = a + 0. I assume that's because the implementation
doesn't think a need the values here need to be special-cased for null
handling, because the types appear to by reference types, not int. And then it
proceeds internally to actually succeed while treating null as 0 somewhere
along the line.
Adding [~smilegator] and [~maryannxue] who happened to work on the test I
mention above recently, and who might be able to comment on alternatives.
> Address translation of LMF closure primitive args to Object in Scala 2.12
> -------------------------------------------------------------------------
>
> Key: SPARK-25044
> URL: https://issues.apache.org/jira/browse/SPARK-25044
> Project: Spark
> Issue Type: Sub-task
> Components: Spark Core, SQL
> Affects Versions: 2.4.0
> Reporter: Sean Owen
> Priority: Major
>
> A few SQL-related tests fail in Scala 2.12, such as UDFSuite's "SPARK-24891
> Fix HandleNullInputsForUDF rule":
> {code:java}
> - SPARK-24891 Fix HandleNullInputsForUDF rule *** FAILED ***
> Results do not match for query:
> ...
> == Results ==
> == Results ==
> !== Correct Answer - 3 == == Spark Answer - 3 ==
> !struct<> struct<a:bigint,b:int,c:int>
> ![0,10,null] [0,10,0]
> ![1,12,null] [1,12,1]
> ![2,14,null] [2,14,2] (QueryTest.scala:163){code}
> You can kind of get what's going on reading the test:
> {code:java}
> test("SPARK-24891 Fix HandleNullInputsForUDF rule") {
> // assume(!ClosureCleanerSuite2.supportsLMFs)
> // This test won't test what it intends to in 2.12, as lambda metafactory
> closures
> // have arg types that are not primitive, but Object
> val udf1 = udf({(x: Int, y: Int) => x + y})
> val df = spark.range(0, 3).toDF("a")
> .withColumn("b", udf1($"a", udf1($"a", lit(10))))
> .withColumn("c", udf1($"a", lit(null)))
> val plan = spark.sessionState.executePlan(df.logicalPlan).analyzed
> comparePlans(df.logicalPlan, plan)
> checkAnswer(
> df,
> Seq(
> Row(0, 10, null),
> Row(1, 12, null),
> Row(2, 14, null)))
> }{code}
>
> It seems that the closure that is fed in as a UDF changes behavior, in a way
> that primitive-type arguments are handled differently. For example an Int
> argument, when fed 'null', acts like 0.
> I'm sure it's a difference in the LMF closure and how its types are
> understood, but not exactly sure of the cause yet.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]