On one of my web pages I describe an algoritm for generating data with a specified correlation between X and Y. The important part of the algorithm is

  • Use the normal random number function available in almost all software to generate two random variables (X and Y).
  • Standardize these variables to mean = 0, sd = 1.
  • Calculate a = r/sqrt(1-r2), where r is the desired correlation.
  • Calculate Z = a*X + Y.
  • Adjust the means and variances of X and Z to what you want them to be by simple linear transformations--(e.g., Xnew = Xold*NewSD + NewMean).
  • Now the correlation between X and Z will be r.
  • The mean of z will be 0.00, and its stand deviation will be sqrt(a2 + 1).
  • If you don't standardize the variables I would assume that the resulting r will come from a population where rho = r, but I haven't worked this out. If anyone knows for sure, I'd appreciate hearing.
I have recently been asked for the source of that algorithm. It has been around for a long time, and I am certainly not the first to recommend it, but I do not know its source. Can anyone help?

Also, does anyone have an opinion about the last item in that list?

Thanks,
Dave Howell

**************************************************************************

David C. Howell
Professor Emeritus
University of Vermont

New address:
David C. Howell                                 Phone: (970) 871-4556
P.O. Box 770059         
Steamboat Springs, CO  80477            email: [EMAIL PROTECTED]


http://www.uvm.edu/~dhowell/StatPages/StatHomePage.html

Reply via email to