[jira] [Commented] (SPARK-19653) `Vector` Type Should Be A First-Class Citizen In Spark SQL

Xiao Li (JIRA) Fri, 17 Feb 2017 16:49:37 -0800

    [ 
https://issues.apache.org/jira/browse/SPARK-19653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872819#comment-15872819
 ]


Xiao Li commented on SPARK-19653:
---------------------------------

cc [~mengxr] [~josephkb]

> `Vector` Type Should Be A First-Class Citizen In Spark SQL
> ----------------------------------------------------------
>
>                 Key: SPARK-19653
>                 URL: https://issues.apache.org/jira/browse/SPARK-19653
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML, MLlib, SQL
>    Affects Versions: 2.1.0, 2.2.0
>            Reporter: Mike Dusenberry
>
> *Issue*: The {{Vector}} type in Spark MLlib (DataFrame-based API, informally 
> "Spark ML") should be added as a first-class citizen to Spark SQL.
> *Current Status*:  Currently, Spark MLlib adds a [{{Vector}} SQL datatype | 
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.linalg.SQLDataTypes$]
>  to allow DataFrames/DataSets to use {{Vector}} columns, which is necessary 
> for MLlib algorithms.  Although this allows a DataFrame/DataSet to contain 
> vectors, it does not allow one to make complete use of the rich set of 
> features made available by Spark SQL.  For example, it is not possible to use 
> any of the SQL functions, such as {{avg}}, {{sum}}, etc. on a {{Vector}} 
> column, nor is it possible to save a DataFrame with a {{Vector}} column as a 
> CSV file.  In any of these cases, an error message is returned with an note 
> that the operator is not supported on a {{Vector}} type.
> *Benefit*: Allow users to make use of all Spark SQL features that can be 
> reasonably applied to a vector.
> *Goal*:  Move the {{Vector}} type from Spark MLlib into Spark SQL as a 
> first-class citizen.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-19653) `Vector` Type Should Be A First-Class Citizen In Spark SQL

Reply via email to