[
https://issues.apache.org/jira/browse/SPARK-15614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15304124#comment-15304124
]
Yanbo Liang edited comment on SPARK-15614 at 5/27/16 2:33 PM:
--------------------------------------------------------------
I vote -1.
* The ML pipeline will firstly use feature transformers to do preprocessing and
then feed the dataset into algorithm estimator, the output of feature
transformers is the input of algorithm estimator. Usually we assemble feature
transformers output columns together by {{VectorAssembler}}, and the output of
{{VectorAssembler}} will be as "feature" of algorithm estimator. So setting
"features" as the default input column name of feature transformer does not
make sense.
* The transformers in {{ml.feature}} are not always handle features. For
examples, {{StringIndexer}} is used to process label rather than feature.
Here are my two cents. Looking forward to hear others' opinion.
was (Author: yanboliang):
I vote -1.
* The ML pipeline will firstly use feature transformers to do preprocessing and
then feed the dataset into algorithm estimator, the output of feature
transformers is the input of algorithm estimator. Usually we assemble feature
transformers output columns together by {{VectorAssembler}}, and the output of
{{VectorAssembler}} will be as feature of algorithm estimator. So setting
"features" as the default input column name of feature transformer does not
make sense.
* The transformers in {{ml.feature}} are not always handle features. For
examples, {{StringIndexer}} is used to process label rather than feature.
Here are my two cents. Looking forward to hear others' opinion.
> ml.feature should support default value of input column
> -------------------------------------------------------
>
> Key: SPARK-15614
> URL: https://issues.apache.org/jira/browse/SPARK-15614
> Project: Spark
> Issue Type: Brainstorming
> Components: ML
> Reporter: zhengruifeng
> Priority: Minor
>
> {{ml.clasification}} and {{ml.clustering}} use {{"features"}} as default
> input column. While {{ml.feature}} use {{setInputCol}} method to set input
> column and don't have default value, which is somewhat strange.
> It may be nice to support default input column "features" in {{ml.feature}},
> and we can make these implements extends {{HasFeaturesCol}} and make existing
> {{setInputCol}} method just a alias.
> I can work on this if needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]