[R] Outliers and robust methods

2006-06-27 Thread Kerpel, John
Hi folks! I need to build a binary dependent variable model, but my independent variables have some serious outliers (they are not data errors.) I was thinking of using the robustbase package because I noticed one of the functions accepts a binary depvar. Can I evaluate this model like any

[R] Outliers

2004-07-04 Thread Richard A. O'Keefe
Last week there was a thread on outlier detection. I came across an article which has a very interesting paragraph. The article is Missing Values, Outliers, Robust Statistics, Non-parametric Methods by Shaun Burke, RHM Techology Ltd, High Wycombe, Buckinghamshire, UK. It was the fourth

Re: [R] Outliers

2004-07-04 Thread Marc Schwartz
On Sun, 2004-07-04 at 19:41, Richard A. O'Keefe wrote: Last week there was a thread on outlier detection. I came across an article which has a very interesting paragraph. The article is Missing Values, Outliers, Robust Statistics, Non-parametric Methods by Shaun Burke, RHM

RE: [R] outliers using Random Forest

2004-04-19 Thread Edgar Acuna
Dear Andy, Thanks for your quick answer. I increased the number of trees and the outlyingness measure got more stable. But still I do not know if I am working with the raw measure or with the normalized measure mentioned in the Breiman's Wald lecture. The normalized measure nout is

RE: [R] outliers using Random Forest

2004-04-19 Thread Liaw, Andy
From: Edgar Acuna [mailto:[EMAIL PROTECTED] Dear Andy, Thanks for your quick answer. I increased the number of trees and the outlyingness measure got more stable. But still I do not know if I am working with the raw measure or with the normalized measure mentioned in the Breiman's Wald

[R] outliers using Random Forest

2004-04-18 Thread Edgar Acuna
Hello, Does anybody know if the outscale option of randomForest yields the standarized version of the outlier measure for each case? or the results are only the raw values. Also I have notice that this measure presents very high variability. I mean if I repeat the experiment I am getting very

RE: [R] outliers using Random Forest

2004-04-18 Thread Liaw, Andy
The thing to do is probably: 1. Use fairly large number of trees (e.g., 1000). 2. Run a few times and average the results. The reason for the instability is sort of two fold: 1. The random forest algorithm itself is based on randomization. That's why it's probably a good idea to have 500-1000

[R] outliers

2004-04-08 Thread mike . campana
Dear all I would like to represent the outliers in the plot. These few outliers are much larger than the limit of 50 in the ylim-argument. plot(daten$month~daten$no,ylim=c(0,50)) I know that it is possible to introduce the information about the presence of outliers without changing the range

Re: [R] outliers

2004-04-08 Thread Jason Turner
[EMAIL PROTECTED] wrote: Dear all I would like to represent the outliers in the plot. These few outliers are much larger than the limit of 50 in the ylim-argument. plot(daten$month~daten$no,ylim=c(0,50)) I know that it is possible to introduce the information about the presence of outliers

[R] outliers/interval data extraction

2003-02-20 Thread Rado Bonk
Dear R-users, I have two outliers related questions. I. I have a vector consisting of 69 values. mean = 0.00086 SD = 0.02152 The shape of EDA graphics (boxplots, density plots) is heavily distorted due to outliers. How to define the interval for outliers exception? Is 2SD - mean + 2SD interval

Re: [R] outliers/interval data extraction

2003-02-20 Thread Ben Bolker
II. How to extract only those values from vector which fulfill the condition of interval (higher than A, and lower than B)? x[xA xB] Rado Bonk __ [EMAIL PROTECTED] mailing list http://www.stat.math.ethz.ch/mailman/listinfo/r-help --

Re: [R] outliers/interval data extraction

2003-02-20 Thread Christian Hennig
Hi, the boxplot is based on the quartiles which are much less outlier sensitive than mean and SD and should therefore not be heavily distorted by outliers. What you mean is presumably that you see the area of the main bulk of the data only as a very small box on the screen because of your

Re: [R] outliers/interval data extraction

2003-02-20 Thread Jason Turner
On Thu, Feb 20, 2003 at 06:37:48PM -0500, Rado Bonk wrote: Dear R-users, I have two outliers related questions. I. I have a vector consisting of 69 values. mean = 0.00086 SD = 0.02152 The shape of EDA graphics (boxplots, density plots) is heavily distorted due to outliers. How to

Re: [R] outliers/interval data extraction

2003-02-20 Thread Jason Turner
On Thu, Feb 20, 2003 at 06:54:21PM +0100, Christian Hennig wrote: ... However, a simple straight forward method for outlier identification is median +/- 5.2*mad as suggested by Hampel, Technometrics 27 (1985) 95-107. ... x - data vector medx - median(x) madx - mad(x) outliers -