[ https://issues.apache.org/jira/browse/SPARK-2737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903542#comment-14903542 ]
Glenn Strycker commented on SPARK-2737: --------------------------------------- I am getting a similar error in Spark 1.3.0... see a new ticket I created: https://issues.apache.org/jira/browse/SPARK-10762 > ClassCastExceptions when collect()ing JavaRDDs' underlying Scala RDDs > --------------------------------------------------------------------- > > Key: SPARK-2737 > URL: https://issues.apache.org/jira/browse/SPARK-2737 > Project: Spark > Issue Type: Bug > Components: Java API > Affects Versions: 0.8.0, 0.9.0, 1.0.0 > Reporter: Josh Rosen > Assignee: Josh Rosen > Fix For: 1.1.0 > > > The Java API's use of fake ClassTags doesn't seem to cause any problems for > Java users, but it can lead to issues when passing JavaRDDs' underlying RDDs > to Scala code (e.g. in the MLlib Java API wrapper code). If we call > {{collect()}} on a Scala RDD with an incorrect ClassTag, this causes > ClassCastExceptions when we try to allocate an array of the wrong type (for > example, see SPARK-2197). > There are a few possible fixes here. An API-breaking fix would be to > completely remove the fake ClassTags and require Java API users to pass > {{java.lang.Class}} instances to all {{parallelize()}} calls and add > {{returnClass}} fields to all {{Function}} implementations. This would be > extremely verbose. > Instead, I propose that we add internal APIs to "repair" a Scala RDD with an > incorrect ClassTag by wrapping it and overriding its ClassTag. This should > be okay for cases where the Scala code that calls {{collect()}} knows what > type of array should be allocated, which is the case in the MLlib wrappers. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org