Hi,

I am a new user of R.
This is a conceptual doubt regarding screeing out outliers from the dataset
in regression.

I read up that Cook's distance can be used and if we want to remove
influential observations, we can use the metric (>4/n) (n=no of
observations) to remove any outliers.

I also came across Grubb's test to identify outliers in univariate distns.
(assumed normal) but i was not able to find contexts in Regression where
Grubb's test is used (may be I didn't search enough)

Is it a good idea to find out Cook's distance and identify outliers.
Perform the Grubb's test for each of these outliers and then delete them?

Right now, I am only using Cook's distance in my problem but I am uncertain
as repeating the procedure with the new datasets (after removing influential
observations) subsequently still keeps showing outliers in the plots.
One reason maybe, i have only 50 data tuples and around 10 input variables
in the Multiple regression equation.

Am I going wrong in my fundamentals while using this approach.

Thanks and regards,

Karthik Srinivasan
M.Mgt - Business Analytics
Indian Institute of Science, Bangalore

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to