[
https://issues.apache.org/jira/browse/SOLR-10991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cassandra Targett updated SOLR-10991:
-------------------------------------
Component/s: streaming expressions
> Support removing top N influential observations in the regress Stream
> Evaluator
> -------------------------------------------------------------------------------
>
> Key: SOLR-10991
> URL: https://issues.apache.org/jira/browse/SOLR-10991
> Project: Solr
> Issue Type: New Feature
> Security Level: Public(Default Security Level. Issues are Public)
> Components: streaming expressions
> Reporter: Joel Bernstein
> Assignee: Joel Bernstein
> Fix For: 7.0
>
>
> Influential observations are *outliers* that have a large effect on the slope
> of a regression line. It is very useful to be able to automatically remove
> influential observations prior to running a simple regression.
> Syntax:
> {code}
> regress(colA, colB, 10)
> {code}
> The function above regresses colA and colB after removing the top 10
> influential observations from the data set.
> The approach taken will be to remove each observation one and at a time and
> re-run the regression on the data set minus the observation. After each run
> the difference in model fit will be recorded. After completing the regression
> runs, N observations that had the highest difference of fit will be removed
> from the data set. The final regression will be run without those
> observations.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]