spark git commit: [SPARK-22993][ML] Clarify HasCheckpointInterval param doc

felixcheung Tue, 09 Jan 2018 23:33:57 -0800

Repository: spark
Updated Branches:
  refs/heads/branch-2.3 ecc24ec7f -> 2db523959



[SPARK-22993][ML] Clarify HasCheckpointInterval param doc

## What changes were proposed in this pull request?

Add a note to the `HasCheckpointInterval` parameter doc that clarifies that 
this setting is ignored when no checkpoint directory has been set on the spark 
context.

## How was this patch tested?

No tests necessary, just a doc update.

Author: sethah <[email protected]>

Closes #20188 from sethah/als_checkpoint_doc.

(cherry picked from commit 70bcc9d5ae33d6669bb5c97db29087ccead770fb)
Signed-off-by: Felix Cheung <[email protected]>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2db52395
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2db52395
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2db52395

Branch: refs/heads/branch-2.3
Commit: 2db523959658d8cd04f83c176e46f6bcdb745335
Parents: ecc24ec
Author: sethah <[email protected]>
Authored: Tue Jan 9 23:32:47 2018 -0800
Committer: Felix Cheung <[email protected]>
Committed: Tue Jan 9 23:33:06 2018 -0800

----------------------------------------------------------------------
 R/pkg/R/mllib_recommendation.R                                 | 2 ++
 R/pkg/R/mllib_tree.R                                           | 6 ++++++
 .../org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala | 4 +++-
 .../scala/org/apache/spark/ml/param/shared/sharedParams.scala  | 4 ++--
 python/pyspark/ml/param/_shared_params_code_gen.py             | 5 +++--
 python/pyspark/ml/param/shared.py                              | 4 ++--
 6 files changed, 18 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/2db52395/R/pkg/R/mllib_recommendation.R
----------------------------------------------------------------------
diff --git a/R/pkg/R/mllib_recommendation.R b/R/pkg/R/mllib_recommendation.R
index fa79424..5441c4a 100644
--- a/R/pkg/R/mllib_recommendation.R
+++ b/R/pkg/R/mllib_recommendation.R
@@ -48,6 +48,8 @@ setClass("ALSModel", representation(jobj = "jobj"))
 #' @param numUserBlocks number of user blocks used to parallelize computation 
(> 0).
 #' @param numItemBlocks number of item blocks used to parallelize computation 
(> 0).
 #' @param checkpointInterval number of checkpoint intervals (>= 1) or disable 
checkpoint (-1).
+#'                           Note: this setting will be ignored if the 
checkpoint directory is not
+#'                           set.
 #' @param ... additional argument(s) passed to the method.
 #' @return \code{spark.als} returns a fitted ALS model.
 #' @rdname spark.als

http://git-wip-us.apache.org/repos/asf/spark/blob/2db52395/R/pkg/R/mllib_tree.R
----------------------------------------------------------------------
diff --git a/R/pkg/R/mllib_tree.R b/R/pkg/R/mllib_tree.R
index 89a58bf..4e5ddf2 100644
--- a/R/pkg/R/mllib_tree.R
+++ b/R/pkg/R/mllib_tree.R
@@ -161,6 +161,8 @@ print.summary.decisionTree <- function(x) {
 #'                            >= 1.
 #' @param minInfoGain Minimum information gain for a split to be considered at 
a tree node.
 #' @param checkpointInterval Param for set checkpoint interval (>= 1) or 
disable checkpoint (-1).
+#'                           Note: this setting will be ignored if the 
checkpoint directory is not
+#'                           set.
 #' @param maxMemoryInMB Maximum memory in MB allocated to histogram 
aggregation.
 #' @param cacheNodeIds If FALSE, the algorithm will pass trees to executors to 
match instances with
 #'                     nodes. If TRUE, the algorithm will cache node IDs for 
each instance. Caching
@@ -382,6 +384,8 @@ setMethod("write.ml", signature(object = 
"GBTClassificationModel", path = "chara
 #' @param minInstancesPerNode Minimum number of instances each child must have 
after split.
 #' @param minInfoGain Minimum information gain for a split to be considered at 
a tree node.
 #' @param checkpointInterval Param for set checkpoint interval (>= 1) or 
disable checkpoint (-1).
+#'                           Note: this setting will be ignored if the 
checkpoint directory is not
+#'                           set.
 #' @param maxMemoryInMB Maximum memory in MB allocated to histogram 
aggregation.
 #' @param cacheNodeIds If FALSE, the algorithm will pass trees to executors to 
match instances with
 #'                     nodes. If TRUE, the algorithm will cache node IDs for 
each instance. Caching
@@ -595,6 +599,8 @@ setMethod("write.ml", signature(object = 
"RandomForestClassificationModel", path
 #' @param minInstancesPerNode Minimum number of instances each child must have 
after split.
 #' @param minInfoGain Minimum information gain for a split to be considered at 
a tree node.
 #' @param checkpointInterval Param for set checkpoint interval (>= 1) or 
disable checkpoint (-1).
+#'                           Note: this setting will be ignored if the 
checkpoint directory is not
+#'                           set.
 #' @param maxMemoryInMB Maximum memory in MB allocated to histogram 
aggregation.
 #' @param cacheNodeIds If FALSE, the algorithm will pass trees to executors to 
match instances with
 #'                     nodes. If TRUE, the algorithm will cache node IDs for 
each instance. Caching

http://git-wip-us.apache.org/repos/asf/spark/blob/2db52395/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
index a5d57a1..6ad44af 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/param/shared/SharedParamsCodeGen.scala
@@ -63,7 +63,9 @@ private[shared] object SharedParamsCodeGen {
       ParamDesc[Array[String]]("outputCols", "output column names"),
       ParamDesc[Int]("checkpointInterval", "set checkpoint interval (>= 1) or 
" +
         "disable checkpoint (-1). E.g. 10 means that the cache will get 
checkpointed " +
-        "every 10 iterations", isValid = "(interval: Int) => interval == -1 || 
interval >= 1"),
+        "every 10 iterations. Note: this setting will be ignored if the 
checkpoint directory " +
+        "is not set in the SparkContext",
+        isValid = "(interval: Int) => interval == -1 || interval >= 1"),
       ParamDesc[Boolean]("fitIntercept", "whether to fit an intercept term", 
Some("true")),
       ParamDesc[String]("handleInvalid", "how to handle invalid entries. 
Options are skip (which " +
         "will filter out rows with bad values), or error (which will throw an 
error). More " +

http://git-wip-us.apache.org/repos/asf/spark/blob/2db52395/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala 
b/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
index 13425da..be8b2f2 100644
--- a/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
+++ b/mllib/src/main/scala/org/apache/spark/ml/param/shared/sharedParams.scala
@@ -282,10 +282,10 @@ trait HasOutputCols extends Params {
 trait HasCheckpointInterval extends Params {
 
   /**
-   * Param for set checkpoint interval (&gt;= 1) or disable checkpoint (-1). 
E.g. 10 means that the cache will get checkpointed every 10 iterations.
+   * Param for set checkpoint interval (&gt;= 1) or disable checkpoint (-1). 
E.g. 10 means that the cache will get checkpointed every 10 iterations. Note: 
this setting will be ignored if the checkpoint directory is not set in the 
SparkContext.
    * @group param
    */
-  final val checkpointInterval: IntParam = new IntParam(this, 
"checkpointInterval", "set checkpoint interval (>= 1) or disable checkpoint 
(-1). E.g. 10 means that the cache will get checkpointed every 10 iterations", 
(interval: Int) => interval == -1 || interval >= 1)
+  final val checkpointInterval: IntParam = new IntParam(this, 
"checkpointInterval", "set checkpoint interval (>= 1) or disable checkpoint 
(-1). E.g. 10 means that the cache will get checkpointed every 10 iterations. 
Note: this setting will be ignored if the checkpoint directory is not set in 
the SparkContext", (interval: Int) => interval == -1 || interval >= 1)
 
   /** @group getParam */
   final def getCheckpointInterval: Int = $(checkpointInterval)

http://git-wip-us.apache.org/repos/asf/spark/blob/2db52395/python/pyspark/ml/param/_shared_params_code_gen.py
----------------------------------------------------------------------
diff --git a/python/pyspark/ml/param/_shared_params_code_gen.py 
b/python/pyspark/ml/param/_shared_params_code_gen.py
index d55d209..1d0f60a 100644
--- a/python/pyspark/ml/param/_shared_params_code_gen.py
+++ b/python/pyspark/ml/param/_shared_params_code_gen.py
@@ -121,8 +121,9 @@ if __name__ == "__main__":
         ("outputCol", "output column name.", "self.uid + '__output'", 
"TypeConverters.toString"),
         ("numFeatures", "number of features.", None, "TypeConverters.toInt"),
         ("checkpointInterval", "set checkpoint interval (>= 1) or disable 
checkpoint (-1). " +
-         "E.g. 10 means that the cache will get checkpointed every 10 
iterations.", None,
-         "TypeConverters.toInt"),
+         "E.g. 10 means that the cache will get checkpointed every 10 
iterations. Note: " +
+         "this setting will be ignored if the checkpoint directory is not set 
in the SparkContext.",
+         None, "TypeConverters.toInt"),
         ("seed", "random seed.", "hash(type(self).__name__)", 
"TypeConverters.toInt"),
         ("tol", "the convergence tolerance for iterative algorithms (>= 0).", 
None,
          "TypeConverters.toFloat"),

http://git-wip-us.apache.org/repos/asf/spark/blob/2db52395/python/pyspark/ml/param/shared.py
----------------------------------------------------------------------
diff --git a/python/pyspark/ml/param/shared.py 
b/python/pyspark/ml/param/shared.py
index e5c5ddf..813f7a5 100644
--- a/python/pyspark/ml/param/shared.py
+++ b/python/pyspark/ml/param/shared.py
@@ -281,10 +281,10 @@ class HasNumFeatures(Params):
 
 class HasCheckpointInterval(Params):
     """
-    Mixin for param checkpointInterval: set checkpoint interval (>= 1) or 
disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed 
every 10 iterations.
+    Mixin for param checkpointInterval: set checkpoint interval (>= 1) or 
disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed 
every 10 iterations. Note: this setting will be ignored if the checkpoint 
directory is not set in the SparkContext.
     """
 
-    checkpointInterval = Param(Params._dummy(), "checkpointInterval", "set 
checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the 
cache will get checkpointed every 10 iterations.", 
typeConverter=TypeConverters.toInt)
+    checkpointInterval = Param(Params._dummy(), "checkpointInterval", "set 
checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the 
cache will get checkpointed every 10 iterations. Note: this setting will be 
ignored if the checkpoint directory is not set in the SparkContext.", 
typeConverter=TypeConverters.toInt)
 
     def __init__(self):
         super(HasCheckpointInterval, self).__init__()


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

spark git commit: [SPARK-22993][ML] Clarify HasCheckpointInterval param doc

Reply via email to