[ 
https://issues.apache.org/jira/browse/SOLR-10991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-10991:
----------------------------------
    Description: 
Influential observations are *outliers* that have a large effect on the slope 
of a regression line. It is very useful to be able to automatically remove 
influential observations prior to running a simple regression.

Syntax:
{code}
regress(colA, colB, 10)
{code}

The function above regresses colA and colB after removing the top 10 
influential observations from the data set.

The approach taken will be to remove each observation one and at a time and 
re-run the regression on the data set minus the observation. After each run the 
difference in model fit will be recorded. After completing the regression runs, 
N observations that had the highest difference of fit will be removed from the 
data set. The final regression will be run without those observations.






  was:
Influential observations are *outliers* that have a large effect on the slope 
of a regression line. It is very useful to be able to automatically remove 
influential observations prior to running a simple regression.

Syntax:
{code}
regress(colA, colB, 10)
{code}

The function above regresses colA and colB after removing the top 10 
influential observations from the data set.







> Support removing top N influential observations in the regress Stream 
> Evaluator
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-10991
>                 URL: https://issues.apache.org/jira/browse/SOLR-10991
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>             Fix For: master (7.0)
>
>
> Influential observations are *outliers* that have a large effect on the slope 
> of a regression line. It is very useful to be able to automatically remove 
> influential observations prior to running a simple regression.
> Syntax:
> {code}
> regress(colA, colB, 10)
> {code}
> The function above regresses colA and colB after removing the top 10 
> influential observations from the data set.
> The approach taken will be to remove each observation one and at a time and 
> re-run the regression on the data set minus the observation. After each run 
> the difference in model fit will be recorded. After completing the regression 
> runs, N observations that had the highest difference of fit will be removed 
> from the data set. The final regression will be run without those 
> observations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to