[
https://issues.apache.org/jira/browse/SPARK-17094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15471563#comment-15471563
]
Nick Pentreath commented on SPARK-17094:
----------------------------------------
It's true that constructor doesn't exist. It could be {{new
Pipeline().setStages(Array(new Tokenizer(), new CountVectorizer(), ...}}
> provide simplified API for ML pipeline
> --------------------------------------
>
> Key: SPARK-17094
> URL: https://issues.apache.org/jira/browse/SPARK-17094
> Project: Spark
> Issue Type: New Feature
> Components: ML
> Reporter: yuhao yang
>
> Many machine learning pipeline has the API for easily assembling transformers.
> One example would be:
> {code}
> val model = new Pipeline("tokenizer", "countvectorizer", "lda").fit(data).
> {code}
> Overall, the feature would
> 1. Allow people (especially starters) to create a ML application in one
> simple line of code.
> 2. And can be handy for users as they don't have to set the input, output
> columns.
> 3. Thinking further, we may not need code any longer to build a Spark ML
> application as it can be done by configuration:
> {code}
> "ml.pipeline.input": "hdfs://path.svm"
> "ml.pipeline": "tokenizer", "hashingTF", "lda"
> "ml.tokenizer.toLowercase": "false"
> ...
> {code}, which can be quite efficient for tuning on cluster.
> Appreciate feedback and suggestions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]