Repository: spark
Updated Branches:
  refs/heads/master ae226283e -> 639df046a


[SPARK-16831][PYTHON] Fixed bug in CrossValidator.avgMetrics

## What changes were proposed in this pull request?

avgMetrics was summed, not averaged, across folds

Author: =^_^= <maxmo...@gmail.com>

Closes #14456 from pkch/pkch-patch-1.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/639df046
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/639df046
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/639df046

Branch: refs/heads/master
Commit: 639df046a250873c26446a037cb832ab28cb5272
Parents: ae22628
Author: =^_^= <maxmo...@gmail.com>
Authored: Wed Aug 3 04:18:28 2016 -0700
Committer: Sean Owen <so...@cloudera.com>
Committed: Wed Aug 3 04:18:28 2016 -0700

----------------------------------------------------------------------
 python/pyspark/ml/tuning.py | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/639df046/python/pyspark/ml/tuning.py
----------------------------------------------------------------------
diff --git a/python/pyspark/ml/tuning.py b/python/pyspark/ml/tuning.py
index 7f967e5..2dcc99c 100644
--- a/python/pyspark/ml/tuning.py
+++ b/python/pyspark/ml/tuning.py
@@ -166,6 +166,8 @@ class CrossValidator(Estimator, ValidatorParams):
     >>> evaluator = BinaryClassificationEvaluator()
     >>> cv = CrossValidator(estimator=lr, estimatorParamMaps=grid, 
evaluator=evaluator)
     >>> cvModel = cv.fit(dataset)
+    >>> cvModel.avgMetrics[0]
+    0.5
     >>> evaluator.evaluate(cvModel.transform(dataset))
     0.8333...
 
@@ -234,7 +236,7 @@ class CrossValidator(Estimator, ValidatorParams):
                 model = est.fit(train, epm[j])
                 # TODO: duplicate evaluator to take extra params from input
                 metric = eva.evaluate(model.transform(validation, epm[j]))
-                metrics[j] += metric
+                metrics[j] += metric/nFolds
 
         if eva.isLargerBetter():
             bestIndex = np.argmax(metrics)


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to