[jira] [Commented] (SPARK-16074) Expose VectorUDT/MatrixUDT in a public API

Xiangrui Meng (JIRA) Mon, 20 Jun 2016 16:17:47 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-16074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340642#comment-15340642
 ]


Xiangrui Meng commented on SPARK-16074:
---------------------------------------

Picked option 2) because we don't have any Java source code in MLlib. The 
overhead for Java users is the extra `()`.

> Expose VectorUDT/MatrixUDT in a public API
> ------------------------------------------
>
>                 Key: SPARK-16074
>                 URL: https://issues.apache.org/jira/browse/SPARK-16074
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLilb
>    Affects Versions: 2.0.0
>            Reporter: Xiangrui Meng
>            Assignee: Xiangrui Meng
>            Priority: Critical
>
> Both VectorUDT and MatrixUDT are private APIs, because UserDefinedType itself 
> is private in Spark. However, in order to let developers implement their own 
> transformers and estimators, we should expose both types in a public API to 
> simply the implementation of transformSchema, transform, etc. Otherwise, they 
> need to get the data types using reflection.
> Note that this doesn't mean to expose VectorUDT/MatrixUDT classes. We can 
> just have a method or a static value that returns VectorUDT/MatrixUDT 
> instance with DataType as the return type. There are two ways to implement 
> this:
> 1. following DataTypes.java in SQL, so Java users doesn't need the extra "()".
> 2. Define DataTypes in Scala.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-16074) Expose VectorUDT/MatrixUDT in a public API

Reply via email to