spark git commit: [SPARK-19762][ML][FOLLOWUP] Add necessary comments to L2Regularization.

yliang Mon, 21 Aug 2017 17:43:37 -0700

Repository: spark
Updated Branches:
  refs/heads/master 84b5b16ea -> c108a5d30



[SPARK-19762][ML][FOLLOWUP] Add necessary comments to L2Regularization.

## What changes were proposed in this pull request?
MLlib ```LinearRegression/LogisticRegression/LinearSVC``` always standardize 
the data during training to improve the rate of convergence regardless of 
_standardization_ is true or false. If _standardization_ is false, we perform 
reverse standardization by penalizing each component differently to get 
effectively the same objective function when the training dataset is not 
standardized. We should keep these comments in the code to let developers 
understand how we handle it correctly.

## How was this patch tested?
Existing tests, only adding some comments in code.

Author: Yanbo Liang <yblia...@gmail.com>

Closes #18992 from yanboliang/SPARK-19762.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c108a5d3
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c108a5d3
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c108a5d3

Branch: refs/heads/master
Commit: c108a5d30e821fef23709681fca7da22bc507129
Parents: 84b5b16
Author: Yanbo Liang <yblia...@gmail.com>
Authored: Tue Aug 22 08:43:18 2017 +0800
Committer: Yanbo Liang <yblia...@gmail.com>
Committed: Tue Aug 22 08:43:18 2017 +0800

----------------------------------------------------------------------
 .../ml/optim/loss/DifferentiableRegularization.scala  | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/c108a5d3/mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala
----------------------------------------------------------------------
diff --git 
a/mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala
 
b/mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala
index 7ac7c22..929374e 100644
--- 
a/mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala
+++ 
b/mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala
@@ -39,9 +39,13 @@ private[ml] trait DifferentiableRegularization[T] extends 
DiffFunction[T] {
  *
  * @param regParam The magnitude of the regularization.
  * @param shouldApply A function (Int => Boolean) indicating whether a given 
index should have
- *                    regularization applied to it.
+ *                    regularization applied to it. Usually we don't apply 
regularization to
+ *                    the intercept.
  * @param applyFeaturesStd Option for a function which maps coefficient index 
(column major) to the
- *                         feature standard deviation. If `None`, no 
standardization is applied.
+ *                         feature standard deviation. Since we always 
standardize the data during
+ *                         training, if `standardization` is false, we have to 
reverse
+ *                         standardization by penalizing each component 
differently by this param.
+ *                         If `standardization` is true, this should be `None`.
  */
 private[ml] class L2Regularization(
     override val regParam: Double,
@@ -57,6 +61,11 @@ private[ml] class L2Regularization(
           val coef = coefficients(j)
           applyFeaturesStd match {
             case Some(getStd) =>
+              // If `standardization` is false, we still standardize the data
+              // to improve the rate of convergence; as a result, we have to
+              // perform this reverse standardization by penalizing each 
component
+              // differently to get effectively the same objective function 
when
+              // the training dataset is not standardized.
               val std = getStd(j)
               if (std != 0.0) {
                 val temp = coef / (std * std)
@@ -66,6 +75,7 @@ private[ml] class L2Regularization(
                 0.0
               }
             case None =>
+              // If `standardization` is true, compute L2 regularization 
normally.
               sum += coef * coef
               gradient(j) = coef * regParam
           }


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

spark git commit: [SPARK-19762][ML][FOLLOWUP] Add necessary comments to L2Regularization.

Reply via email to