[
https://issues.apache.org/jira/browse/SPARK-16474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15369829#comment-15369829
]
Amit Sela commented on SPARK-16474:
-----------------------------------
I thought the bufferEncoder is supposed to take care of that.. unless it's
because of the input ? would that be solved by
https://issues.apache.org/jira/browse/SPARK-15769 ?
Anyway, I'll close this, though this is a bit confusing because the API looks
the same as if I was doing groupByKey().agg(), and in this case it works. Is
this because the groupByKey provides the conversion?
> Global Aggregation doesn't seem to work at all
> -----------------------------------------------
>
> Key: SPARK-16474
> URL: https://issues.apache.org/jira/browse/SPARK-16474
> Project: Spark
> Issue Type: Sub-task
> Components: SQL
> Affects Versions: 1.6.2, 2.0.0
> Reporter: Amit Sela
>
> Executing a global aggregation (not grouped by key) fails.
> Take the following code for example:
> {code}
> val session = SparkSession.builder()
> .appName("TestGlobalAggregator")
> .master("local[*]")
> .getOrCreate()
> import session.implicits._
> val ds1 = List(1, 2, 3).toDS
> val ds2 = ds1.agg(
> new Aggregator[Int, Int, Int]{
> def zero: Int = 0
> def reduce(b: Int, a: Int): Int = b + a
> def merge(b1: Int, b2: Int): Int = b1 + b2
> def finish(reduction: Int): Int = reduction
> def bufferEncoder: Encoder[Int] = implicitly[Encoder[Int]]
> def outputEncoder: Encoder[Int] = implicitly[Encoder[Int]]
> }.toColumn)
> ds2.printSchema
> ds2.show
> {code}
> I would expect the result to be 6, but instead I get the following exception:
> {noformat}
> java.lang.ClassCastException:
> org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast
> to java.lang.Integer
> at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)
> .........
> {noformat}
> Trying the same code on DataFrames in 1.6.2 results in:
> {noformat}
> Exception in thread "main" org.apache.spark.sql.AnalysisException: unresolved
> operator 'Aggregate [(anon$1(),mode=Complete,isDistinct=false) AS
> anon$1()#8];
> at
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:38)
> ..........
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]