[R] How to programme R to randomly replace some X values with Outliers

2010-10-03 Thread Hock Ann Lim
Dear experts,
I am a beginner of R.
I'm looking for experts to guide me how to do programming in R in order to 
randomly replace 5 observations in X explanatory variable with outliers drawn 
from U(15,20) in sample size n=100. The replacement subject to y  15. 

The ultimate goal of my study is to compare the std of y with and without the 
presence of outliers based on average of 1000 simulation.

Info :
X~U(0,10)
Y=6+2X+norm(0,1)

Thank you.

Hock Ann


  
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to programme R to randomly replace some X values with Outliers

2010-10-03 Thread Joshua Wiley
Dear Hock Ann,

I am not sure of all your requirements, but this should at least get
you started.  I show it by hand and also wrapped up in a function.  In
the function I made two density plots that I thought might be
interesting to you, you can just delete those two lines of code if you
do not want them.  The code follows below.

Cheers,

Josh

## X is pseudo-random numbers from the uniform distribution
## ~U(0, 10)
X - runif(n = 100, min = 0, max = 10)
## We can check that X is about what we would expect
## The mean should be
(1/2) * (10 + 0)
mean(x = X)
## The variance should be
(1/12) * ((10 - 0)^2)
var(x = X)

## I am assuming norm(0, 1) to be representing the standard normal distribution
## Create a vectory of these numbers to be used in the formula for Y
## this is important since you will be create Y twice
## (with X and then X replaced with some outliers)
## You do not want R to regenerate the random normal values and change
things that way
Z - rnorm(n = 100, mean = 0, sd = 1)

## Create Y from your formula, where Z = norm(0, 1)
## Y = 6 + 2X + norm(0, 1)
Y - 6 + 2 * X + Z

## Now I am using sample() to randomly select some values
## between 1 and the length of X, these will be the positions
## of the elements of X to be replaced
toreplace - sample(x = seq_along(X), size = 5, replace = FALSE)

## Now replace the X values
X[toreplace] - runif(n = 5, min = 15, max = 20)

## Create Ynew based off updated X
Ynew - 6 + 2 * X + Z

## Calculate the standard deviations of Y and Ynew
## and store in a named vector called results
results - c(SD_Y = sd(Y), SD_Ynew = sd(Ynew))

## print the results vector to screen to look at it
results

## Now if you wanted to do this many times
## and potentially change a few values easily
## we can put it in a function
## n is the number in each sample
## a and b are the min and max of the uniform distribution for X
## a.outlier and b.outlier are the same but for the outliers
## nreplace is how many values of X you want to replace
## reps is how many times you want to run it
## I have written the values to default to what you said in your emamil
## but obviously it would be easy to change any one of them

mysampler - function(n = 100, a = 0, b = 10,
  a.outlier = 15, b.outlier = 20,
  nreplace = 5, reps = 1000) {
  if(any(c(n, nreplace, reps)  1)) {
stop(n, nreplace, and reps must all be at least 1)
  }
  results - matrix(0, nrow = reps, ncol = 2,
dimnames = list(NULL, c(SD_Y, SD_Ynew)))
  for(i in 1:reps) {
X - runif(n = n, min = a, max = b)
Z - rnorm(n = n, mean = 0, sd = 1)
Y - 6 + 2 * X + Z
toreplace - sample(x = seq_along(X), size = nreplace, replace = FALSE)
X[toreplace] - runif(n = nreplace, min = a.outlier, max = b.outlier)
Ynew - 6 + 2 * X + Z
results[i, ] - c(sd(Y), sd(Ynew))
  }
  dev.new()
  par(mfrow = c(2, 1))
  plot(density(results[,SD_Y]), xlim = range(results))
  plot(density(results[,SD_Ynew]), xlim = range(results))
  return(results)
}

## You might find the following documentation helpful
?runif # generate random values from uniform
?rnorm # from normal
?for # to do your simulation


On Sat, Oct 2, 2010 at 11:12 PM, Hock Ann Lim lim...@yahoo.com wrote:
 Dear experts,
 I am a beginner of R.
 I'm looking for experts to guide me how to do programming in R in order to
 randomly replace 5 observations in X explanatory variable with outliers drawn
 from U(15,20) in sample size n=100. The replacement subject to y  15.

 The ultimate goal of my study is to compare the std of y with and without the
 presence of outliers based on average of 1000 simulation.

 Info :
 X~U(0,10)
 Y=6+2X+norm(0,1)

 Thank you.

 Hock Ann



        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.





-- 
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to programme R to randomly replace some X values with Outliers

2010-10-03 Thread Michael Bedward
N - 100
Nrep - 5

X - runif(N, 0, 10)
Y - 6 + 2*X + rnorm(N, 0, 1)
X[ sample(which(Y  15), Nrep) ] - runif(Nrep, 15, 20)

Hope this helps,
Michael


On 3 October 2010 16:12, Hock Ann Lim lim...@yahoo.com wrote:
 Dear experts,
 I am a beginner of R.
 I'm looking for experts to guide me how to do programming in R in order to
 randomly replace 5 observations in X explanatory variable with outliers drawn
 from U(15,20) in sample size n=100. The replacement subject to y  15.

 The ultimate goal of my study is to compare the std of y with and without the
 presence of outliers based on average of 1000 simulation.

 Info :
 X~U(0,10)
 Y=6+2X+norm(0,1)

 Thank you.

 Hock Ann



        [[alternative HTML version deleted]]


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.