[jira] [Commented] (SPARK-17094) provide simplified API for ML pipeline

Nick Pentreath (JIRA) Wed, 07 Sep 2016 12:25:50 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-17094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15471563#comment-15471563
 ]


Nick Pentreath commented on SPARK-17094:
----------------------------------------

It's true that constructor doesn't exist. It could be {{new 
Pipeline().setStages(Array(new Tokenizer(), new CountVectorizer(), ...}}

> provide simplified API for ML pipeline
> --------------------------------------
>
>                 Key: SPARK-17094
>                 URL: https://issues.apache.org/jira/browse/SPARK-17094
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: yuhao yang
>
> Many machine learning pipeline has the API for easily assembling transformers.
> One example would be:
> {code}
> val model = new Pipeline("tokenizer", "countvectorizer", "lda").fit(data).
> {code}
> Overall, the feature would 
> 1. Allow people (especially starters) to create a ML application in one 
> simple line of code. 
> 2. And can be handy for users as they don't have to set the input, output 
> columns.
> 3. Thinking further, we may not need code any longer to build a Spark ML 
> application as it can be done by configuration:
> {code}
> "ml.pipeline.input": "hdfs://path.svm"
> "ml.pipeline": "tokenizer", "hashingTF", "lda"
> "ml.tokenizer.toLowercase": "false"
> ...
> {code}, which can be quite efficient for tuning on cluster.
> Appreciate feedback and suggestions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-17094) provide simplified API for ML pipeline

Reply via email to