[ 
https://issues.apache.org/jira/browse/SPARK-21837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li resolved SPARK-21837.
-----------------------------
       Resolution: Fixed
    Fix Version/s: 2.3.0

> UserDefinedTypeSuite local UDFs not actually testing what it intends
> --------------------------------------------------------------------
>
>                 Key: SPARK-21837
>                 URL: https://issues.apache.org/jira/browse/SPARK-21837
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL, Tests
>    Affects Versions: 2.2.0
>            Reporter: Sean Owen
>            Assignee: Sean Owen
>            Priority: Minor
>             Fix For: 2.3.0
>
>
> Consider this test in {{UserDefinedTypeSuite}}:
> {code}
>   test("Local UDTs") {
>     val df = Seq((1, new UDT.MyDenseVector(Array(0.1, 1.0)))).toDF("int", 
> "vec")
>     df.collect()(0).getAs[UDT.MyDenseVector](1)
>     df.take(1)(0).getAs[UDT.MyDenseVector](1)
>     
> df.limit(1).groupBy('int).agg(first('vec)).collect()(0).getAs[UDT.MyDenseVector](0)
>     df.orderBy('int).limit(1).groupBy('int).agg(first('vec)).collect()(0)
>       .getAs[UDT.MyDenseVector](0)
>   }
> {code}
> I claim the last two lines can't be right, because they say that the first 
> column in the aggregation is the vector, when it is the grouping key (int). 
> But it passes! 
> But it started failing when I made seemingly unrelated changes in 
> https://github.com/apache/spark/pull/18645 like:
> {code}
> [info] - Local UDTs *** FAILED *** (144 milliseconds)
> [info]   java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> org.apache.spark.sql.UDT$MyDenseVector
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$10.apply(UserDefinedTypeSuite.scala:211)
> [info]   at 
> org.apache.spark.sql.UserDefinedTypeSuite$$anonfun$10.apply(UserDefinedTypeSuite.scala:205)
> {code}
> I modified the test to actually assert that the vector that results in each 
> case is the expected one, and it began failing with the same error, in 
> master. Therefore I am pretty sure the test is not quite doing what it seems 
> to want to, and the result of these expressions just happened to not be fully 
> evaluated or checked.
> CC [~marmbrus] for the discussion at 
> https://github.com/apache/spark/commit/3ae25f244bd471ef77002c703f2cc7ed6b524f11##commitcomment-23320234
>  and apologies if I'm still really missing something here. I'll open a PR to 
> show you what I mean.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to