I'm trying to program a simple Monte Carlo experiment that after generating
sample data within each MC rep then runs a GLM within it, accumulating the
coefficients from each rep in a matrix. As the GLM command needs a
dataframe I convert all my matrix held simulated data into this form and
then try and run a for loop (see end of included code below). It does not
like the $i notation. Can anyone please come up with a workaround? I had a
quick look in the glm code to see if I could figure out how to create a
modelmatrix myself and call the glm directly on that - it defeated me (as
I'm not the greatest of programmers). If anyone could offer advice on that
too I'd be grateful - would save the glm command having to convert the
dataframe on each call - which on a large number of MC reps might be worth
while.
Any help much appreciated (and apologies for all the code comments - my
memories terrible so need to include).
Best,
Steve
using DataFrames, GLM, Distributions
reps=100 # number of MC sims
n=50 # sample size in each sim
X0 = rep(1,n) # the constant
nLevelsX1 = 5 # distinct values in continuous X1 - make it a factor of n
X1=gl(nLevelsX1,int(n/nLevelsX1)) # gl() is in DataFrames
X2 = gl(2,1,n) - 1 # the treatmnt dummy
X = [X0 X1 X2] # design matrix
T=1.0 # the treatment effect
B = [10, 1, T] # Beta vector
mu = float(X*B) # varying mu parameter in lognormal distribution
sigma = rep(1.0,n) # constant sigma parameter in lognormal distribution
Obs = Array(Float64, reps,n) # holds data for analysis - each row new
dataset
# loop below creates the Y (dependent) data - one row for each MC sim
for i = 1:n
Obs[:,i]= rand(LogNormal(mu[i],sigma[i]),reps)
end
# create dataframe where first two cols hold the unvarying X
# and the remaining cols hold each Y sim
df= convert(DataFrame, [X[:,2:3] Obs'])
rename!(df.colindex, [(symbol("x$i")=>symbol("Y$(i-2)")) for i in 3:(n+2)])
names(df) # x1 x2 Y1 Y2 ...Yreps
GLMresults = Array(Float64, reps,3) # will hold all coeff
#next loop fails - please help - won't accept $i
for i = 1:reps
GLMresults[i,:] = coef(glm(Y$i ~ x1 + x2,df,Normal(),LogLink()))
end