[ https://issues.apache.org/jira/browse/SPARK-17094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15471563#comment-15471563 ]
Nick Pentreath commented on SPARK-17094: ---------------------------------------- It's true that constructor doesn't exist. It could be {{new Pipeline().setStages(Array(new Tokenizer(), new CountVectorizer(), ...}} > provide simplified API for ML pipeline > -------------------------------------- > > Key: SPARK-17094 > URL: https://issues.apache.org/jira/browse/SPARK-17094 > Project: Spark > Issue Type: New Feature > Components: ML > Reporter: yuhao yang > > Many machine learning pipeline has the API for easily assembling transformers. > One example would be: > {code} > val model = new Pipeline("tokenizer", "countvectorizer", "lda").fit(data). > {code} > Overall, the feature would > 1. Allow people (especially starters) to create a ML application in one > simple line of code. > 2. And can be handy for users as they don't have to set the input, output > columns. > 3. Thinking further, we may not need code any longer to build a Spark ML > application as it can be done by configuration: > {code} > "ml.pipeline.input": "hdfs://path.svm" > "ml.pipeline": "tokenizer", "hashingTF", "lda" > "ml.tokenizer.toLowercase": "false" > ... > {code}, which can be quite efficient for tuning on cluster. > Appreciate feedback and suggestions. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org