Teng Peng created SPARK-22433:
---------------------------------
Summary: Linear regression R^2 train/test terminology related
Key: SPARK-22433
URL: https://issues.apache.org/jira/browse/SPARK-22433
Project: Spark
Issue Type: Improvement
Components: ML
Affects Versions: 2.2.0
Reporter: Teng Peng
Priority: Minor
Traditional statistics is traditional statistics. Their goal, framework, and
terminologies are not the same as ML. However, in linear regression related
components, this distinction is not clear, which is reflected:
1. regressionMetric + regressionEvaluator :
* R2 shouldn't be there.
* A better name "regressionPredictionMetric".
2. LinearregRessionSuite:
* Shouldn't test R2 and residuals on test data.
* There is no train set and test set in this setting.
3. Terminology: there is no "linear regression with L1 regularization". Linear
regression is linear. Adding a penalty term, then it is no longer linear. Just
call it "LASSO", "ElasticNet".
There are more. I am working on correcting them.
They are not breaking anything, but it does not make one feel good to see the
basic distinction is blurred.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]