spark git commit: [SPARK-19797][DOC] ML pipeline document correction

srowen Fri, 03 Mar 2017 02:56:37 -0800

Repository: spark
Updated Branches:
  refs/heads/master fa50143cd -> 0bac3e4cd



[SPARK-19797][DOC] ML pipeline document correction

## What changes were proposed in this pull request?
Description about pipeline in this paragraph is incorrect 
https://spark.apache.org/docs/latest/ml-pipeline.html#how-it-works

> If the Pipeline had more **stages**, it would call the 
> LogisticRegressionModelâs transform() method on the DataFrame before 
> passing the DataFrame to the next stage.

Reason: Transformer could also be a stage. But only another Estimator will 
invoke an transform call and pass the data to next stage. The description in 
the document misleads ML pipeline users.

## How was this patch tested?
This is a tiny modification of **docs/ml-pipelines.md**. I jekyll build the 
modification and check the compiled document.

Author: Zhe Sun <ymwda...@gmail.com>

Closes #17137 from ymwdalex/SPARK-19797-ML-pipeline-document-correction.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/0bac3e4c
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/0bac3e4c
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/0bac3e4c

Branch: refs/heads/master
Commit: 0bac3e4cde75678beac02e67b8873fe779e9ad34
Parents: fa50143
Author: Zhe Sun <ymwda...@gmail.com>
Authored: Fri Mar 3 11:55:57 2017 +0100
Committer: Sean Owen <so...@cloudera.com>
Committed: Fri Mar 3 11:55:57 2017 +0100

----------------------------------------------------------------------
 docs/ml-pipeline.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/0bac3e4c/docs/ml-pipeline.md
----------------------------------------------------------------------
diff --git a/docs/ml-pipeline.md b/docs/ml-pipeline.md
index 7cbb146..aa92c0a 100644
--- a/docs/ml-pipeline.md
+++ b/docs/ml-pipeline.md
@@ -132,7 +132,7 @@ The `Pipeline.fit()` method is called on the original 
`DataFrame`, which has raw
 The `Tokenizer.transform()` method splits the raw text documents into words, 
adding a new column with words to the `DataFrame`.
 The `HashingTF.transform()` method converts the words column into feature 
vectors, adding a new column with those vectors to the `DataFrame`.
 Now, since `LogisticRegression` is an `Estimator`, the `Pipeline` first calls 
`LogisticRegression.fit()` to produce a `LogisticRegressionModel`.
-If the `Pipeline` had more stages, it would call the 
`LogisticRegressionModel`'s `transform()`
+If the `Pipeline` had more `Estimator`s, it would call the 
`LogisticRegressionModel`'s `transform()`
 method on the `DataFrame` before passing the `DataFrame` to the next stage.
 
 A `Pipeline` is an `Estimator`.


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-19797][DOC] ML pipeline document correction

Reply via email to