[
https://issues.apache.org/jira/browse/SPARK-11497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14988774#comment-14988774
]
Apache Spark commented on SPARK-11497:
--------------------------------------
User 'dusenberrymw' has created a pull request for this issue:
https://github.com/apache/spark/pull/9458
> PySpark RowMatrix Constructor Has Type Erasure Issue
> ----------------------------------------------------
>
> Key: SPARK-11497
> URL: https://issues.apache.org/jira/browse/SPARK-11497
> Project: Spark
> Issue Type: Bug
> Components: MLlib, PySpark
> Reporter: Mike Dusenberry
>
> Implementing tallSkinnyQR in SPARK-9656 uncovered a bug with our PySpark
> RowMatrix constructor. As discussed on the dev list
> [here|http://apache-spark-developers-list.1001551.n3.nabble.com/K-Means-And-Class-Tags-td10038.html],
> there appears to be an issue with type erasure with RDDs coming from Java,
> and by extension from PySpark. Although we are attempting to construct a
> RowMatrix from an RDD[Vector] in
> [PythonMLlibAPI|https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala#L1115],
> the Vector type is erased, resulting in an RDD[Object]. Thus, when calling
> Scala's tallSkinnyQR from PySpark, we get a Java ClassCastException in which
> an Object cannot be cast to a Spark Vector. As noted in the aforementioned
> dev list thread, this issue was also encountered with DecisionTrees, and the
> fix involved an explicit retag of the RDD with a Vector type. Thus, this PR
> will apply that fix to the createRowMatrix helper function in PythonMLlibAPI.
> IndexedRowMatrix and CoordinateMatrix do not appear to have this issue likely
> due to their related helper functions in PythonMLlibAPI creating the RDDs
> explicitly from DataFrames with pattern matching, thus preserving the types.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]