[
https://issues.apache.org/jira/browse/SPARK-19653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872819#comment-15872819
]
Xiao Li commented on SPARK-19653:
---------------------------------
cc [~mengxr] [~josephkb]
> `Vector` Type Should Be A First-Class Citizen In Spark SQL
> ----------------------------------------------------------
>
> Key: SPARK-19653
> URL: https://issues.apache.org/jira/browse/SPARK-19653
> Project: Spark
> Issue Type: Improvement
> Components: ML, MLlib, SQL
> Affects Versions: 2.1.0, 2.2.0
> Reporter: Mike Dusenberry
>
> *Issue*: The {{Vector}} type in Spark MLlib (DataFrame-based API, informally
> "Spark ML") should be added as a first-class citizen to Spark SQL.
> *Current Status*: Currently, Spark MLlib adds a [{{Vector}} SQL datatype |
> https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.ml.linalg.SQLDataTypes$]
> to allow DataFrames/DataSets to use {{Vector}} columns, which is necessary
> for MLlib algorithms. Although this allows a DataFrame/DataSet to contain
> vectors, it does not allow one to make complete use of the rich set of
> features made available by Spark SQL. For example, it is not possible to use
> any of the SQL functions, such as {{avg}}, {{sum}}, etc. on a {{Vector}}
> column, nor is it possible to save a DataFrame with a {{Vector}} column as a
> CSV file. In any of these cases, an error message is returned with an note
> that the operator is not supported on a {{Vector}} type.
> *Benefit*: Allow users to make use of all Spark SQL features that can be
> reasonably applied to a vector.
> *Goal*: Move the {{Vector}} type from Spark MLlib into Spark SQL as a
> first-class citizen.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]