[
https://issues.apache.org/jira/browse/SPARK-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Josh Rosen resolved SPARK-2737.
-------------------------------
Resolution: Fixed
Fix Version/s: 1.1.0
Target Version/s: (was: 1.1.0)
> ClassCastExceptions when collect()ing JavaRDDs' underlying Scala RDDs
> ---------------------------------------------------------------------
>
> Key: SPARK-2737
> URL: https://issues.apache.org/jira/browse/SPARK-2737
> Project: Spark
> Issue Type: Bug
> Components: Java API
> Affects Versions: 0.8.0, 0.9.0, 1.0.0
> Reporter: Josh Rosen
> Assignee: Josh Rosen
> Fix For: 1.1.0
>
>
> The Java API's use of fake ClassTags doesn't seem to cause any problems for
> Java users, but it can lead to issues when passing JavaRDDs' underlying RDDs
> to Scala code (e.g. in the MLlib Java API wrapper code). If we call
> {{collect()}} on a Scala RDD with an incorrect ClassTag, this causes
> ClassCastExceptions when we try to allocate an array of the wrong type (for
> example, see SPARK-2197).
> There are a few possible fixes here. An API-breaking fix would be to
> completely remove the fake ClassTags and require Java API users to pass
> {{java.lang.Class}} instances to all {{parallelize()}} calls and add
> {{returnClass}} fields to all {{Function}} implementations. This would be
> extremely verbose.
> Instead, I propose that we add internal APIs to "repair" a Scala RDD with an
> incorrect ClassTag by wrapping it and overriding its ClassTag. This should
> be okay for cases where the Scala code that calls {{collect()}} knows what
> type of array should be allocated, which is the case in the MLlib wrappers.
--
This message was sent by Atlassian JIRA
(v6.2#6252)