[
https://issues.apache.org/jira/browse/SOLR-10991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Joel Bernstein updated SOLR-10991:
----------------------------------
Description:
Influential observations are *outliers* that have a large effect on the slope
of a regression line. It is very useful to be able to automatically remove
influential observations prior to running a simple regression.
Syntax:
{code}
regress(colA, colB, 10)
{code}
The function above regresses colA and colB after removing the top 10
influential observations from the data set.
The approach taken will be to remove each observation one and at a time and
re-run the regression on the data set minus the observation. After each run the
difference in model fit will be recorded. After completing the regression runs,
N observations that had the highest difference of fit will be removed from the
data set. The final regression will be run without those observations.
was:
Influential observations are *outliers* that have a large effect on the slope
of a regression line. It is very useful to be able to automatically remove
influential observations prior to running a simple regression.
Syntax:
{code}
regress(colA, colB, 10)
{code}
The function above regresses colA and colB after removing the top 10
influential observations from the data set.
> Support removing top N influential observations in the regress Stream
> Evaluator
> -------------------------------------------------------------------------------
>
> Key: SOLR-10991
> URL: https://issues.apache.org/jira/browse/SOLR-10991
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Reporter: Joel Bernstein
> Assignee: Joel Bernstein
> Fix For: master (7.0)
>
>
> Influential observations are *outliers* that have a large effect on the slope
> of a regression line. It is very useful to be able to automatically remove
> influential observations prior to running a simple regression.
> Syntax:
> {code}
> regress(colA, colB, 10)
> {code}
> The function above regresses colA and colB after removing the top 10
> influential observations from the data set.
> The approach taken will be to remove each observation one and at a time and
> re-run the regression on the data set minus the observation. After each run
> the difference in model fit will be recorded. After completing the regression
> runs, N observations that had the highest difference of fit will be removed
> from the data set. The final regression will be run without those
> observations.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]