[ 
https://issues.apache.org/jira/browse/SPARK-16319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358693#comment-15358693
 ] 

Sean Owen commented on SPARK-16319:
-----------------------------------

To verify the DAG property? or else somewhere down the line you'll find that 
one stage needs a column that hasn't been generated yet. The shape of the graph 
matter in that it must be a DAG, and the input order matters in that it must be 
consistent with the DAG.

> Non-linear (DAG) pipelines need better explanation
> --------------------------------------------------
>
>                 Key: SPARK-16319
>                 URL: https://issues.apache.org/jira/browse/SPARK-16319
>             Project: Spark
>          Issue Type: Documentation
>          Components: ML
>    Affects Versions: 2.0.0
>            Reporter: Max Moroz
>            Priority: Minor
>
> There's a 
> [paragraph|http://spark.apache.org/docs/2.0.0-preview/ml-guide.html#details] 
> about non-linear pipeline in the ML docs, but it's not clear how DAG pipeline 
> differs from a linear pipeline, and in fact, it seems that a "DAG Pipeline" 
> results in the behavior identical to that of a regular linear pipeline (the 
> stages are simply applied in the order provided when the pipeline is 
> created). In addition, no checks of input and output columns seem to occur 
> when the pipeline.fit() or pipeline.transform() is called.
> It would be better to clarify in the docs and/or remove that paragraph.
> I'd be happy to write it up, but I have no idea what the intention of this 
> concept is at this point.
> [Additional reference on 
> SO|http://stackoverflow.com/questions/37541668/non-linear-dag-ml-pipelines-in-apache-spark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to