[
https://issues.apache.org/jira/browse/SPARK-24216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Fangshi Li updated SPARK-24216:
-------------------------------
Description:
When user create a aggregator object in scala and pass the aggregator to Spark
Dataset's agg() method, Spark's will initialize TypedAggregateExpression with
the nodeName field as aggregator.getClass.getSimpleName. However, getSimpleName
is not safe in scala environment, depending on how user creates the aggregator
object. For example, if the aggregator class full qualified name is
"com.my.company.MyUtils$myAgg$2$", the getSimpleName will throw
java.lang.InternalError "Malformed class name". This has been reported in
scalatest
[scalatest/scalatest#1044|https://github.com/scalatest/scalatest/pull/1044] and
discussed in many scala upstream jiras such as SI-8110, SI-5425.
To fix this issue, we follow the solution in
[scalatest/scalatest#1044|https://github.com/scalatest/scalatest/pull/1044] to
add safer version of getSimpleName as a util method, and
TypedAggregateExpression will invoke this util method rather than
getClass.getSimpleName.
was:
When we create a aggregator object within a function in scala and pass the
aggregator to Spark Dataset's aggregation method, Spark's will initialize
TypedAggregateExpression with the name field as
aggregator.getClass.getSimpleName. However, getSimpleName is not safe in scala
environment, for example, if the aggregator class full qualified name is
"com.my.company.MyUtils$myAgg$2$", the getSimpleName will throw error
"Malformed class name". This has been reported in scalatest
[https://github.com/scalatest/scalatest/pull/1044] and scala upstream jira
[https://issues.scala-lang.org/browse/SI-8110].
To fix this issue, we follow the solution in
[https://github.com/scalatest/scalatest/pull/1044] to add safer version of
getSimpleName as a util method, and TypedAggregateExpression will invoke this
util method rather than getClass.getSimpleName.
> Spark TypedAggregateExpression uses getSimpleName that is not safe in scala
> ---------------------------------------------------------------------------
>
> Key: SPARK-24216
> URL: https://issues.apache.org/jira/browse/SPARK-24216
> Project: Spark
> Issue Type: Bug
> Components: SQL
> Affects Versions: 2.3.0, 2.3.1
> Reporter: Fangshi Li
> Priority: Major
>
> When user create a aggregator object in scala and pass the aggregator to
> Spark Dataset's agg() method, Spark's will initialize
> TypedAggregateExpression with the nodeName field as
> aggregator.getClass.getSimpleName. However, getSimpleName is not safe in
> scala environment, depending on how user creates the aggregator object. For
> example, if the aggregator class full qualified name is
> "com.my.company.MyUtils$myAgg$2$", the getSimpleName will throw
> java.lang.InternalError "Malformed class name". This has been reported in
> scalatest
> [scalatest/scalatest#1044|https://github.com/scalatest/scalatest/pull/1044]
> and discussed in many scala upstream jiras such as SI-8110, SI-5425.
> To fix this issue, we follow the solution in
> [scalatest/scalatest#1044|https://github.com/scalatest/scalatest/pull/1044]
> to add safer version of getSimpleName as a util method, and
> TypedAggregateExpression will invoke this util method rather than
> getClass.getSimpleName.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]