Dear all, I've a question regarding outliners and the number of data points.
For instance, I want to use regression to calculate the slope over 3 years, i.e. 36 data points, one point for each month. So I use the following method: 1. calculate the median value 2. find the standard deviation 3. set the threshold = median value + std dev * constant (e.g. constant = 10) 4. outliers are the data points which are greater than the threshold. 5. replace an outlier with the mean of its neighbor data points. 6. regression However, I also want to find the slope for each year using the same method. As I may not have all the 12 data points for each calendar year (e.g. Feb 01 - Jan 04, 36 data points in total, 11 data points for the 1st year and 1 data points for the last year), I found the above-mentioned method didn't work very well to detect the outliers. I'm thinking about making the constant smaller for fewer data points. Any ideas? Thanks, SChiu . . ================================================================= Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at: . http://jse.stat.ncsu.edu/ . =================================================================
