Josh Rosen created SPARK-2737:
---------------------------------

             Summary: ClassCastExceptions when collect()ing JavaRDDs' 
underlying Scala RDDs
                 Key: SPARK-2737
                 URL: https://issues.apache.org/jira/browse/SPARK-2737
             Project: Spark
          Issue Type: Bug
          Components: Java API
    Affects Versions: 1.0.0, 0.9.0, 0.8.0
            Reporter: Josh Rosen
            Assignee: Josh Rosen


The Java API's use of fake ClassTags doesn't seem to cause any problems for 
Java users, but it can lead to issues when passing JavaRDDs' underlying RDDs to 
Scala code (e.g. in the MLlib Java API wrapper code).  If we call {{collect()}} 
on a Scala RDD with an incorrect ClassTag, this causes ClassCastExceptions when 
we try to allocate an array of the wrong type (for example, see SPARK-2197).

There are a few possible fixes here.  An API-breaking fix would be to 
completely remove the fake ClassTags and require Java API users to pass 
{{java.lang.Class}} instances to all {{parallelize()}} calls and add 
{{returnClass}} fields to all {{Function}} implementations.  This would be 
extremely verbose.

Instead, I propose that we add internal APIs to "repair" a Scala RDD with an 
incorrect ClassTag by wrapping it and overriding its ClassTag.  This should be 
okay for cases where the Scala code that calls {{collect()}} knows what type of 
array should be allocated, which is the case in the MLlib wrappers.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to