[jira] [Commented] (SPARK-16319) Non-linear (DAG) pipelines need better explanation

Max Moroz (JIRA) Fri, 01 Jul 2016 01:58:51 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-16319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358676#comment-15358676
 ]


Max Moroz commented on SPARK-16319:
-----------------------------------

Hmmm but if the execution is NOT different, then why would the API even bother 
to check inputCol / outputCol? It's basically just executing the stages in the 
order given at pipeline creation, completely oblivious to any information 
gathered from the data flow graph. In that case, the text that mentions DAG and 
inputCol / outputCol creates a somewhat misleading impression that the shape of 
this graph might actually affect something.

> Non-linear (DAG) pipelines need better explanation
> --------------------------------------------------
>
>                 Key: SPARK-16319
>                 URL: https://issues.apache.org/jira/browse/SPARK-16319
>             Project: Spark
>          Issue Type: Documentation
>          Components: ML
>    Affects Versions: 2.0.0
>            Reporter: Max Moroz
>            Priority: Minor
>
> There's a 
> [paragraph|http://spark.apache.org/docs/2.0.0-preview/ml-guide.html#details] 
> about non-linear pipeline in the ML docs, but it's not clear how DAG pipeline 
> differs from a linear pipeline, and in fact, it seems that a "DAG Pipeline" 
> results in the behavior identical to that of a regular linear pipeline (the 
> stages are simply applied in the order provided when the pipeline is 
> created). In addition, no checks of input and output columns seem to occur 
> when the pipeline.fit() or pipeline.transform() is called.
> It would be better to clarify in the docs and/or remove that paragraph.
> I'd be happy to write it up, but I have no idea what the intention of this 
> concept is at this point.
> [Additional reference on 
> SO|http://stackoverflow.com/questions/37541668/non-linear-dag-ml-pipelines-in-apache-spark]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SPARK-16319) Non-linear (DAG) pipelines need better explanation

Reply via email to