[
https://issues.apache.org/jira/browse/SPARK-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356944#comment-15356944
]
Nick Pentreath commented on SPARK-15944:
----------------------------------------
As I commented on [PR #13924|https://github.com/apache/spark/pull/13924]:
> It happens to work for dense vectors because it effectively calls
> {{np.array(DenseVector)}}, but not for sparse. Workaround is fairly ugly:
> {{mlSV = NewVectors.sparse(mllibSV.size, zip(mllibSV.indices,
> mllibSV.values))}}, or something similar.
I think we need convenience methods for Python too - I've created SPARK-16328
to track that.
> Make spark.ml package backward compatible with spark.mllib vectors
> ------------------------------------------------------------------
>
> Key: SPARK-15944
> URL: https://issues.apache.org/jira/browse/SPARK-15944
> Project: Spark
> Issue Type: Umbrella
> Components: ML, MLlib
> Affects Versions: 2.0.0
> Reporter: Xiangrui Meng
> Assignee: Xiangrui Meng
> Priority: Critical
>
> During QA, we found that it is not trivial to convert a DataFrame with old
> vector columns to new vector columns. So it would be easier for users to
> migrate their datasets and pipelines if we:
> 1) provide utils to convert DataFrames with vector columns
> 2) automatically detect and convert old vector columns in ML pipelines
> This is an umbrella JIRA to track the progress.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]