Repository: spark
Updated Branches:
  refs/heads/master 4e0fb010c -> d58fe2883


[SPARK-23154][ML][DOC] Document backwards compatibility guarantees for ML 
persistence

## What changes were proposed in this pull request?

Added documentation about what MLlib guarantees in terms of loading ML models 
and Pipelines from old Spark versions.  Discussed & confirmed on linked JIRA.

Author: Joseph K. Bradley <jos...@databricks.com>

Closes #20592 from jkbradley/SPARK-23154-backwards-compat-doc.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d58fe288
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d58fe288
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/d58fe288

Branch: refs/heads/master
Commit: d58fe28836639e68e262812d911f167cb071007b
Parents: 4e0fb01
Author: Joseph K. Bradley <jos...@databricks.com>
Authored: Tue Feb 13 11:18:45 2018 -0800
Committer: Joseph K. Bradley <jos...@databricks.com>
Committed: Tue Feb 13 11:18:45 2018 -0800

----------------------------------------------------------------------
 docs/ml-pipeline.md | 31 +++++++++++++++++++++++++++++--
 1 file changed, 29 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/d58fe288/docs/ml-pipeline.md
----------------------------------------------------------------------
diff --git a/docs/ml-pipeline.md b/docs/ml-pipeline.md
index aa92c0a..e22e900 100644
--- a/docs/ml-pipeline.md
+++ b/docs/ml-pipeline.md
@@ -188,9 +188,36 @@ Parameters belong to specific instances of `Estimator`s 
and `Transformer`s.
 For example, if we have two `LogisticRegression` instances `lr1` and `lr2`, 
then we can build a `ParamMap` with both `maxIter` parameters specified: 
`ParamMap(lr1.maxIter -> 10, lr2.maxIter -> 20)`.
 This is useful if there are two algorithms with the `maxIter` parameter in a 
`Pipeline`.
 
-## Saving and Loading Pipelines
+## ML persistence: Saving and Loading Pipelines
 
-Often times it is worth it to save a model or a pipeline to disk for later 
use. In Spark 1.6, a model import/export functionality was added to the 
Pipeline API. Most basic transformers are supported as well as some of the more 
basic ML models. Please refer to the algorithm's API documentation to see if 
saving and loading is supported.
+Often times it is worth it to save a model or a pipeline to disk for later 
use. In Spark 1.6, a model import/export functionality was added to the 
Pipeline API.
+As of Spark 2.3, the DataFrame-based API in `spark.ml` and `pyspark.ml` has 
complete coverage.
+
+ML persistence works across Scala, Java and Python.  However, R currently uses 
a modified format,
+so models saved in R can only be loaded back in R; this should be fixed in the 
future and is
+tracked in [SPARK-15572](https://issues.apache.org/jira/browse/SPARK-15572).
+
+### Backwards compatibility for ML persistence
+
+In general, MLlib maintains backwards compatibility for ML persistence.  I.e., 
if you save an ML
+model or Pipeline in one version of Spark, then you should be able to load it 
back and use it in a
+future version of Spark.  However, there are rare exceptions, described below.
+
+Model persistence: Is a model or Pipeline saved using Apache Spark ML 
persistence in Spark
+version X loadable by Spark version Y?
+
+* Major versions: No guarantees, but best-effort.
+* Minor and patch versions: Yes; these are backwards compatible.
+* Note about the format: There are no guarantees for a stable persistence 
format, but model loading itself is designed to be backwards compatible.
+
+Model behavior: Does a model or Pipeline in Spark version X behave identically 
in Spark version Y?
+
+* Major versions: No guarantees, but best-effort.
+* Minor and patch versions: Identical behavior, except for bug fixes.
+
+For both model persistence and model behavior, any breaking changes across a 
minor version or patch
+version are reported in the Spark version release notes. If a breakage is not 
reported in release
+notes, then it should be treated as a bug to be fixed.
 
 # Code examples
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to