spark git commit: [SPARK-11964][DOCS][ML] Add in Pipeline Import/Export Documentation

jkbradley Fri, 11 Dec 2015 12:56:57 -0800

Repository: spark
Updated Branches:
  refs/heads/branch-1.6 f05bae4a3 -> 2ddd10486



[SPARK-11964][DOCS][ML] Add in Pipeline Import/Export Documentation

Adding in Pipeline Import and Export Documentation.

Author: anabranch <wac.chamb...@gmail.com>
Author: Bill Chambers <wchamb...@ischool.berkeley.edu>

Closes #10179 from anabranch/master.

(cherry picked from commit aa305dcaf5b4148aba9e669e081d0b9235f50857)
Signed-off-by: Joseph K. Bradley <jos...@databricks.com>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2ddd1048
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2ddd1048
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2ddd1048

Branch: refs/heads/branch-1.6
Commit: 2ddd10486b91619117b0c236c86e4e0f39869cfa
Parents: f05bae4
Author: anabranch <wac.chamb...@gmail.com>
Authored: Fri Dec 11 12:55:56 2015 -0800
Committer: Joseph K. Bradley <jos...@databricks.com>
Committed: Fri Dec 11 12:56:20 2015 -0800

----------------------------------------------------------------------
 docs/ml-guide.md | 13 +++++++++++++
 1 file changed, 13 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/2ddd1048/docs/ml-guide.md
----------------------------------------------------------------------
diff --git a/docs/ml-guide.md b/docs/ml-guide.md
index 5c96c2b..44a316a 100644
--- a/docs/ml-guide.md
+++ b/docs/ml-guide.md
@@ -192,6 +192,10 @@ Parameters belong to specific instances of `Estimator`s 
and `Transformer`s.
 For example, if we have two `LogisticRegression` instances `lr1` and `lr2`, 
then we can build a `ParamMap` with both `maxIter` parameters specified: 
`ParamMap(lr1.maxIter -> 10, lr2.maxIter -> 20)`.
 This is useful if there are two algorithms with the `maxIter` parameter in a 
`Pipeline`.
 
+## Saving and Loading Pipelines
+
+Often times it is worth it to save a model or a pipeline to disk for later 
use. In Spark 1.6, a model import/export functionality was added to the 
Pipeline API. Most basic transformers are supported as well as some of the more 
basic ML models. Please refer to the algorithm's API documentation to see if 
saving and loading is supported.
+
 # Code examples
 
 This section gives code examples illustrating the functionality discussed 
above.
@@ -455,6 +459,15 @@ val pipeline = new Pipeline()
 // Fit the pipeline to training documents.
 val model = pipeline.fit(training)
 
+// now we can optionally save the fitted pipeline to disk
+model.save("/tmp/spark-logistic-regression-model")
+
+// we can also save this unfit pipeline to disk
+pipeline.save("/tmp/unfit-lr-model")
+
+// and load it back in during production
+val sameModel = Pipeline.load("/tmp/spark-logistic-regression-model")
+
 // Prepare test documents, which are unlabeled (id, text) tuples.
 val test = sqlContext.createDataFrame(Seq(
   (4L, "spark i j k"),


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-11964][DOCS][ML] Add in Pipeline Import/Export Documentation

Reply via email to