[
https://issues.apache.org/jira/browse/SPARK-22433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Teng Peng updated SPARK-22433:
------------------------------
Description:
Traditional statistics is traditional statistics. Their goal, framework, and
terminologies are not the same as ML. However, in linear regression related
components, this distinction is not clear, which is reflected:
1. regressionMetric + regressionEvaluator :
* R2 shouldn't be there.
* A better name "regressionPredictionMetric".
2. LinearRegressionSuite:
* Shouldn't test R2 and residuals on test data.
* There is no train set and test set in this setting.
3. Terminology: there is no "linear regression with L1 regularization". Linear
regression is linear. Adding a penalty term, then it is no longer linear. Just
call it "LASSO", "ElasticNet".
There are more. I am working on correcting them.
They are not breaking anything, but it does not make one feel good to see the
basic distinction is blurred.
was:
Traditional statistics is traditional statistics. Their goal, framework, and
terminologies are not the same as ML. However, in linear regression related
components, this distinction is not clear, which is reflected:
1. regressionMetric + regressionEvaluator :
* R2 shouldn't be there.
* A better name "regressionPredictionMetric".
2. LinearRessionSuite:
* Shouldn't test R2 and residuals on test data.
* There is no train set and test set in this setting.
3. Terminology: there is no "linear regression with L1 regularization". Linear
regression is linear. Adding a penalty term, then it is no longer linear. Just
call it "LASSO", "ElasticNet".
There are more. I am working on correcting them.
They are not breaking anything, but it does not make one feel good to see the
basic distinction is blurred.
> Linear regression R^2 train/test terminology related
> -----------------------------------------------------
>
> Key: SPARK-22433
> URL: https://issues.apache.org/jira/browse/SPARK-22433
> Project: Spark
> Issue Type: Improvement
> Components: ML
> Affects Versions: 2.2.0
> Reporter: Teng Peng
> Priority: Minor
>
> Traditional statistics is traditional statistics. Their goal, framework, and
> terminologies are not the same as ML. However, in linear regression related
> components, this distinction is not clear, which is reflected:
> 1. regressionMetric + regressionEvaluator :
> * R2 shouldn't be there.
> * A better name "regressionPredictionMetric".
> 2. LinearRegressionSuite:
> * Shouldn't test R2 and residuals on test data.
> * There is no train set and test set in this setting.
> 3. Terminology: there is no "linear regression with L1 regularization".
> Linear regression is linear. Adding a penalty term, then it is no longer
> linear. Just call it "LASSO", "ElasticNet".
> There are more. I am working on correcting them.
> They are not breaking anything, but it does not make one feel good to see the
> basic distinction is blurred.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]