[ https://issues.apache.org/jira/browse/SPARK-27867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
zhengruifeng resolved SPARK-27867. ---------------------------------- Resolution: Not A Problem > RegressionEvaluator cache lastest RegressionMetrics to avoid duplicated > computation > ----------------------------------------------------------------------------------- > > Key: SPARK-27867 > URL: https://issues.apache.org/jira/browse/SPARK-27867 > Project: Spark > Issue Type: Improvement > Components: ML > Affects Versions: 3.0.0 > Reporter: zhengruifeng > Priority: Major > > In most cases, given a model, we have to obtain multi metrics of it. > For examples, a regression model, we may need to obtain the R2, MAE and MSE. > However, current design of `Evaluator` do not support computing multi metrics > at once. > In practice, we usually use RegressionEvaluator like this: > {code:java} > val evaluator = new RegressionEvaluator() > val r2 = evaluator.setMetricName("r2").evaluate(df) > val mae = evaluator.setMetricName("mae").evaluate(df) > val mse = evaluator.setMetricName("mse").evaluate(df){code} > > However, current impl of RegressionEvaluator needs one pass of the whole > input dataset to compute one metric. So, above example needs 3 passes. > This can be optimized since in \{RegressionMetrics} all metrics can be > computed at once. > If we cache the lastest inputs, and then if the next evaluate call keep the > inputs (except the metricName), then we can directly obtain the metric from > the internal intermediate summary. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org