Matt Cheah created SPARK-6044:
---------------------------------
Summary: RDD.aggregate() should not use the closure serializer on
the zero value
Key: SPARK-6044
URL: https://issues.apache.org/jira/browse/SPARK-6044
Project: Spark
Issue Type: Bug
Components: Spark Core
Affects Versions: 1.3.0
Reporter: Matt Cheah
Fix For: 1.4.0
PairRDDFunctions.aggregateByKey() correctly uses
SparkEnv.get.serializer.newInstance() to serialize the zero value. It seems
this logic is not mirrored in RDD.aggregate(), which computes the aggregation
and returns the aggregation directly at the driver. We should change
RDD.aggregate() to make this consistent; I ran into some serialization errors
because I was expecting RDD.aggregate() to Kryo serialize the zero value.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]