Hi, I am a new user of R. This is a conceptual doubt regarding screeing out outliers from the dataset in regression.
I read up that Cook's distance can be used and if we want to remove influential observations, we can use the metric (>4/n) (n=no of observations) to remove any outliers. I also came across Grubb's test to identify outliers in univariate distns. (assumed normal) but i was not able to find contexts in Regression where Grubb's test is used (may be I didn't search enough) Is it a good idea to find out Cook's distance and identify outliers. Perform the Grubb's test for each of these outliers and then delete them? Right now, I am only using Cook's distance in my problem but I am uncertain as repeating the procedure with the new datasets (after removing influential observations) subsequently still keeps showing outliers in the plots. One reason maybe, i have only 50 data tuples and around 10 input variables in the Multiple regression equation. Am I going wrong in my fundamentals while using this approach. Thanks and regards, Karthik Srinivasan M.Mgt - Business Analytics Indian Institute of Science, Bangalore [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.