Repository: spark Updated Branches: refs/heads/master 84b5b16ea -> c108a5d30
[SPARK-19762][ML][FOLLOWUP] Add necessary comments to L2Regularization. ## What changes were proposed in this pull request? MLlib ```LinearRegression/LogisticRegression/LinearSVC``` always standardize the data during training to improve the rate of convergence regardless of _standardization_ is true or false. If _standardization_ is false, we perform reverse standardization by penalizing each component differently to get effectively the same objective function when the training dataset is not standardized. We should keep these comments in the code to let developers understand how we handle it correctly. ## How was this patch tested? Existing tests, only adding some comments in code. Author: Yanbo Liang <yblia...@gmail.com> Closes #18992 from yanboliang/SPARK-19762. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c108a5d3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c108a5d3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c108a5d3 Branch: refs/heads/master Commit: c108a5d30e821fef23709681fca7da22bc507129 Parents: 84b5b16 Author: Yanbo Liang <yblia...@gmail.com> Authored: Tue Aug 22 08:43:18 2017 +0800 Committer: Yanbo Liang <yblia...@gmail.com> Committed: Tue Aug 22 08:43:18 2017 +0800 ---------------------------------------------------------------------- .../ml/optim/loss/DifferentiableRegularization.scala | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/c108a5d3/mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala ---------------------------------------------------------------------- diff --git a/mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala b/mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala index 7ac7c22..929374e 100644 --- a/mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala +++ b/mllib/src/main/scala/org/apache/spark/ml/optim/loss/DifferentiableRegularization.scala @@ -39,9 +39,13 @@ private[ml] trait DifferentiableRegularization[T] extends DiffFunction[T] { * * @param regParam The magnitude of the regularization. * @param shouldApply A function (Int => Boolean) indicating whether a given index should have - * regularization applied to it. + * regularization applied to it. Usually we don't apply regularization to + * the intercept. * @param applyFeaturesStd Option for a function which maps coefficient index (column major) to the - * feature standard deviation. If `None`, no standardization is applied. + * feature standard deviation. Since we always standardize the data during + * training, if `standardization` is false, we have to reverse + * standardization by penalizing each component differently by this param. + * If `standardization` is true, this should be `None`. */ private[ml] class L2Regularization( override val regParam: Double, @@ -57,6 +61,11 @@ private[ml] class L2Regularization( val coef = coefficients(j) applyFeaturesStd match { case Some(getStd) => + // If `standardization` is false, we still standardize the data + // to improve the rate of convergence; as a result, we have to + // perform this reverse standardization by penalizing each component + // differently to get effectively the same objective function when + // the training dataset is not standardized. val std = getStd(j) if (std != 0.0) { val temp = coef / (std * std) @@ -66,6 +75,7 @@ private[ml] class L2Regularization( 0.0 } case None => + // If `standardization` is true, compute L2 regularization normally. sum += coef * coef gradient(j) = coef * regParam } --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org