[jira] [Assigned] (SPARK-24467) VectorAssemblerEstimator

Apache Spark (JIRA) Sat, 09 Jun 2018 13:26:09 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-24467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Apache Spark reassigned SPARK-24467:
------------------------------------

    Assignee: Apache Spark

> VectorAssemblerEstimator
> ------------------------
>
>                 Key: SPARK-24467
>                 URL: https://issues.apache.org/jira/browse/SPARK-24467
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 2.4.0
>            Reporter: Joseph K. Bradley
>            Assignee: Apache Spark
>            Priority: Major
>
> In [SPARK-22346], I believe I made a wrong API decision: I recommended added 
> `VectorSizeHint` instead of making `VectorAssembler` into an Estimator since 
> I thought the latter option would break most workflows.  However, I should 
> have proposed:
> * Add a Param to VectorAssembler for specifying the sizes of Vectors in the 
> inputCols.  This Param can be optional.  If not given, then VectorAssembler 
> will behave as it does now.  If given, then VectorAssembler can use that info 
> instead of figuring out the Vector sizes via metadata or examining Rows in 
> the data (though it could do consistency checks).
> * Add a VectorAssemblerEstimator which gets the Vector lengths from data and 
> produces a VectorAssembler with the vector lengths Param specified.
> This will not break existing workflows.  Migrating to 
> VectorAssemblerEstimator will be easier than adding VectorSizeHint since it 
> will not require users to manually input Vector lengths.
> Note: Even with this Estimator, VectorSizeHint might prove useful for other 
> things in the future which require vector length metadata, so we could 
> consider keeping it rather than deprecating it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Assigned] (SPARK-24467) VectorAssemblerEstimator

Reply via email to