Max Moroz created SPARK-16319:
---------------------------------
Summary: Pipeline / DAG
Key: SPARK-16319
URL: https://issues.apache.org/jira/browse/SPARK-16319
Project: Spark
Issue Type: Documentation
Components: ML
Affects Versions: 2.0.0
Reporter: Max Moroz
Priority: Minor
There's a
[paragraph|http://spark.apache.org/docs/2.0.0-preview/ml-guide.html#details]
about non-linear pipeline in the ML docs, but it's not clear how DAG pipeline
differs from a linear pipeline, and in fact, it seems that a "DAG Pipeline"
results in the behavior identical to that of a regular linear pipeline (the
stages are simply applied in the order provided when the pipeline is created).
In addition, no checks of input and output columns seem to occur when the
pipeline.fit() or pipeline.transform() is called.
It would be better to clarify in the docs and/or remove that paragraph.
I'd be happy to write it up, but I have no idea what the intention of this
concept is at this point.
[Additional reference on
SO|http://stackoverflow.com/questions/37541668/non-linear-dag-ml-pipelines-in-apache-spark]
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]