[ 
https://issues.apache.org/jira/browse/SPARK-15944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356944#comment-15356944
 ] 

Nick Pentreath commented on SPARK-15944:
----------------------------------------

As I commented on [PR #13924|https://github.com/apache/spark/pull/13924]:

>  It happens to work for dense vectors because it effectively calls 
> {{np.array(DenseVector)}}, but not for sparse. Workaround is fairly ugly: 
> {{mlSV = NewVectors.sparse(mllibSV.size, zip(mllibSV.indices, 
> mllibSV.values))}}, or something similar.

I think we need convenience methods for Python too - I've created SPARK-16328 
to track that. 


> Make spark.ml package backward compatible with spark.mllib vectors
> ------------------------------------------------------------------
>
>                 Key: SPARK-15944
>                 URL: https://issues.apache.org/jira/browse/SPARK-15944
>             Project: Spark
>          Issue Type: Umbrella
>          Components: ML, MLlib
>    Affects Versions: 2.0.0
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>            Priority: Critical
>
> During QA, we found that it is not trivial to convert a DataFrame with old 
> vector columns to new vector columns. So it would be easier for users to 
> migrate their datasets and pipelines if we:
> 1) provide utils to convert DataFrames with vector columns
> 2) automatically detect and convert old vector columns in ML pipelines
> This is an umbrella JIRA to track the progress.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to