[ 
https://issues.apache.org/jira/browse/SOLR-10991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cassandra Targett updated SOLR-10991:
-------------------------------------
    Component/s: streaming expressions

> Support removing top N influential observations in the regress Stream 
> Evaluator
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-10991
>                 URL: https://issues.apache.org/jira/browse/SOLR-10991
>             Project: Solr
>          Issue Type: New Feature
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: streaming expressions
>            Reporter: Joel Bernstein
>            Assignee: Joel Bernstein
>             Fix For: 7.0
>
>
> Influential observations are *outliers* that have a large effect on the slope 
> of a regression line. It is very useful to be able to automatically remove 
> influential observations prior to running a simple regression.
> Syntax:
> {code}
> regress(colA, colB, 10)
> {code}
> The function above regresses colA and colB after removing the top 10 
> influential observations from the data set.
> The approach taken will be to remove each observation one and at a time and 
> re-run the regression on the data set minus the observation. After each run 
> the difference in model fit will be recorded. After completing the regression 
> runs, N observations that had the highest difference of fit will be removed 
> from the data set. The final regression will be run without those 
> observations.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to