Dear List, I'm running simulations using the glmnet package. I need to use an 'automated' method for model selection at each iteration of the simulation. The cv.glmnet function in the same package is handy for that purpose. However, in my simulation I have p >> N, and in some cases the selected model from cv.glmet is essentially shrinking all coefficients to zero. In this case, the prediction for each instance equals the average of the response variable. A reproducible example is shown below.
Is there a reasonable way to prevent this from happening in a simulation setting with glmnet? That is, I'd like the selected model to give me some useful predictions. I've tested using alternative loss measures (type.measure argument), but none is satisfactory in all cases. This question is not necessarily R related (so sorry for that): when comparing glmnet with other models in terms of predictive accuracy, is it fair to make the comparison including those cases in which the `best' cv.glmnet can do in an automated setting is pred = avg(response)? library(glmnet) set.seed(1010) n=100;p=3000 nzc=trunc(p/10) x=matrix(rnorm(n*p),n,p) beta=rnorm(nzc) fx= x[,seq(nzc)] %*% beta eps=rnorm(n)*5 y=drop(fx+eps) px=exp(fx) px=px/(1+px) ly=rbinom(n=length(px),prob=px,size=1) fit.net <- cv.glmnet(x, ly, family = "binomial", alpha = 1, # lasso penalty type.measure = "deviance", standardize = FALSE, intercept = FALSE, nfolds = 10, keep = FALSE) plot(fit.net) log(fit.net$lambda.1se) pred <- predict(fit.net, x, type = "response", s = "lambda.1se") all(coef(fit.net) == 0) all(pred ==0.5) Thanks in advance for your thoughts. Regards, Lars. [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.