[jira] [Resolved] (SPARK-24597) Spark ML Pipeline Should support non-linear models => DAGPipeline

Hyukjin Kwon (Jira) Mon, 07 Oct 2019 22:46:13 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-24597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Hyukjin Kwon resolved SPARK-24597.
----------------------------------
    Resolution: Incomplete

> Spark ML Pipeline Should support non-linear models => DAGPipeline
> -----------------------------------------------------------------
>
>                 Key: SPARK-24597
>                 URL: https://issues.apache.org/jira/browse/SPARK-24597
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>    Affects Versions: 2.3.1
>            Reporter: Michael Dreibelbis
>            Priority: Minor
>              Labels: bulk-closed
>
> Currently SparkML Pipeline/PipelineModel supports single linear dataset 
> transformation
> despite the documentation stating otherwise:
> [reference 
> documentation|https://spark.apache.org/docs/2.3.0/ml-pipeline.html#details] 
>  I'm proposing implementing a DAGPipeline and supporting multiple datasets as 
> input
> The code could look something like this:
>  
> {code:java}
> val ds1 = /*dataset 1 creation*/
> val ds2 = /*dataset 2 creation*/
> // nodes take on uid from estimator/transformer
> val i1 = IdentityNode(new IdentityTransformer("i1"))
> val i2 = IdentityNode(new IdentityTransformer("i2"))
> val bi = TransformerNode(new Binarizer("bi"))
> val cv = EstimatorNode(new CountVectorizer("cv"))
> val idf = EstimatorNode(new IDF("idf"))
> val j1 = JoinerNode(new Joiner("j1"))
> val nodes = Array(i1, i2, bi, cv, idf)
> val edges = Array(
> ("i1", "cv"), ("cv", "idf"), ("idf", "j1"), 
> ("i2", "bi"), ("bi", "j1"))
> val p = new DAGPipeline(nodes, edges)
> .setIdentity("i1", ds1)
> .setIdentity("i2", ds2)
> val m = p.fit(spark.emptyDataFrame)
> m.setIdentity("i1", ds1).setIdentity("i2", ds2)
> m.transform(spark.emptyDataFrame)
> {code}
>  
>  
>          
>           



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Resolved] (SPARK-24597) Spark ML Pipeline Should support non-linear models => DAGPipeline

Reply via email to