[jira] [Updated] (SPARK-22433) Linear regression R^2 train/test terminology related

Teng Peng (JIRA) Thu, 02 Nov 2017 19:29:40 -0700

     [ 
https://issues.apache.org/jira/browse/SPARK-22433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Teng Peng updated SPARK-22433:
------------------------------
    Description: 
Traditional statistics is traditional statistics. Their goal, framework, and 
terminologies are not the same as ML. However, in linear regression related 
components, this distinction is not clear, which is reflected:
1. regressionMetric + regressionEvaluator : 
* R2 shouldn't be there. 
* A better name "regressionPredictionMetric".

2. LinearRegressionSuite: 
* Shouldn't test R2 and residuals on test data. 
* There is no train set and test set in this setting.

3. Terminology: there is no "linear regression with L1 regularization". Linear 
regression is linear. Adding a penalty term, then it is no longer linear. Just 
call it "LASSO", "ElasticNet".

There are more. I am working on correcting them.

They are not breaking anything, but it does not make one feel good to see the 
basic distinction is blurred.

  was:
Traditional statistics is traditional statistics. Their goal, framework, and 
terminologies are not the same as ML. However, in linear regression related 
components, this distinction is not clear, which is reflected:
1. regressionMetric + regressionEvaluator : 
* R2 shouldn't be there. 
* A better name "regressionPredictionMetric".

2. LinearRessionSuite: 
* Shouldn't test R2 and residuals on test data. 
* There is no train set and test set in this setting.

3. Terminology: there is no "linear regression with L1 regularization". Linear 
regression is linear. Adding a penalty term, then it is no longer linear. Just 
call it "LASSO", "ElasticNet".

There are more. I am working on correcting them.

They are not breaking anything, but it does not make one feel good to see the 
basic distinction is blurred.


> Linear regression R^2 train/test terminology related 
> -----------------------------------------------------
>
>                 Key: SPARK-22433
>                 URL: https://issues.apache.org/jira/browse/SPARK-22433
>             Project: Spark
>          Issue Type: Improvement
>          Components: ML
>    Affects Versions: 2.2.0
>            Reporter: Teng Peng
>            Priority: Minor
>
> Traditional statistics is traditional statistics. Their goal, framework, and 
> terminologies are not the same as ML. However, in linear regression related 
> components, this distinction is not clear, which is reflected:
> 1. regressionMetric + regressionEvaluator : 
> * R2 shouldn't be there. 
> * A better name "regressionPredictionMetric".
> 2. LinearRegressionSuite: 
> * Shouldn't test R2 and residuals on test data. 
> * There is no train set and test set in this setting.
> 3. Terminology: there is no "linear regression with L1 regularization". 
> Linear regression is linear. Adding a penalty term, then it is no longer 
> linear. Just call it "LASSO", "ElasticNet".
> There are more. I am working on correcting them.
> They are not breaking anything, but it does not make one feel good to see the 
> basic distinction is blurred.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (SPARK-22433) Linear regression R^2 train/test terminology related

Reply via email to