[ 
https://issues.apache.org/jira/browse/SPARK-24216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Fangshi Li updated SPARK-24216:
-------------------------------
    Description: 
When we create a aggregator object within a function in scala and pass the 
aggregator to Spark Dataset's aggregation method, Spark's will initialize 
TypedAggregateExpression with the name field as 
aggregator.getClass.getSimpleName. However, getSimpleName is not safe in scala 
environment, for example, if the aggregator class full qualified name is 
"com.my.company.MyUtils$myAgg$2$", the getSimpleName will throw error 
"Malformed class name". This has been reported in scalatest 
[https://github.com/scalatest/scalatest/pull/1044] and scala upstream jira 
[https://issues.scala-lang.org/browse/SI-8110].

To fix this issue, we follow the solution in 
[https://github.com/scalatest/scalatest/pull/1044] to add safer version of 
getSimpleName as a util method, and TypedAggregateExpression will invoke this 
util method rather than getClass.getSimpleName.

  was:
When we create a aggregator object within a function in scala and pass the 
aggregator to Spark Dataset's aggregation method, Spark's will initialize 
TypedAggregateExpression with the name field as 
aggregator.getClass.getSimpleName. However, getSimpleName is not safe in scala 
environment, for example, if the aggregator class full qualified name is 
"com.my.company.MyUtils$myAgg$2$", the getSimpleName will throw exception 
"Malformed class name". This has been reported in scalatest 
[https://github.com/scalatest/scalatest/pull/1044] and scala upstream jira 
[https://issues.scala-lang.org/browse/SI-8110].

To fix this issue, we follow the solution in 
[https://github.com/scalatest/scalatest/pull/1044] to add safer version of 
getSimpleName as a util method, and TypedAggregateExpression will invoke this 
util method rather than getClass.getSimpleName.


> Spark TypedAggregateExpression uses getSimpleName that is not safe in scala
> ---------------------------------------------------------------------------
>
>                 Key: SPARK-24216
>                 URL: https://issues.apache.org/jira/browse/SPARK-24216
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.3.0, 2.3.1
>            Reporter: Fangshi Li
>            Priority: Major
>
> When we create a aggregator object within a function in scala and pass the 
> aggregator to Spark Dataset's aggregation method, Spark's will initialize 
> TypedAggregateExpression with the name field as 
> aggregator.getClass.getSimpleName. However, getSimpleName is not safe in 
> scala environment, for example, if the aggregator class full qualified name 
> is "com.my.company.MyUtils$myAgg$2$", the getSimpleName will throw error 
> "Malformed class name". This has been reported in scalatest 
> [https://github.com/scalatest/scalatest/pull/1044] and scala upstream jira 
> [https://issues.scala-lang.org/browse/SI-8110].
> To fix this issue, we follow the solution in 
> [https://github.com/scalatest/scalatest/pull/1044] to add safer version of 
> getSimpleName as a util method, and TypedAggregateExpression will invoke this 
> util method rather than getClass.getSimpleName.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to