Dear list, Sorry for this cross-post from StackOverflow, but I see that SO was maybe the wrong forum for this question. Too package specific and
Ok, what I am trying to do is to predict from an L1 penalized regression. This falls due to a data set dimension problem that I cannot figure out. The procedure I'm using is the following: require(penalized)# neg contains negative data# pos contains positive data Now, the procedure below aims to construct comparable (balanced in terms os positive and negative cases) training and validation data sets. # 50% negative training set negSamp <- neg %>% sample_frac(0.5) %>% as.data.frame()# Negative validation set negCompl <- neg[setdiff(row.names(neg),row.names(negSamp)),]# 50% positive training set posSamp <- pos %>% sample_frac(0.5) %>% as.data.frame()# Positive validation set posCompl <- pos[setdiff(row.names(pos),row.names(posSamp)),]# Combine sets validat <- rbind(negSamp,posSamp) training <- rbind(negCompl,posCompl) Ok, so here we now have two comparable sets. [1] FALSE TRUE> dim(training)[1] 1061 381> dim(validat)[1] 1060 381> identical(names(training),names(validat))[1] TRUE I fit the model to the training set without a problem (and I've tried using a range of Lambda1 values here). But, fitting the model to the validation data set fails, with a just odd error description. > fit <- > penalized(VoiceTremor,training[-1],data=training,lambda1=40,standardize=TRUE)# > nonzero coefficients: 13> fit2 <- predict(fit, penalized=validat[-1], > data=validat) Error in .local(object, ...) : row counts of "penalized", "unpenalized" and/or "data" do not match Just to make sure that this is not due to some NA's in the data set: > identical(validat,na.omit(validat))[1] TRUE Oddly enough, I may generate some new data that is comparable to the proper data set: > data.frame(VoiceTremor="NVT",matrix(rnorm(380000),nrow=1000,ncol=380) ) -> neg > data.frame(VoiceTremor="VT",matrix(rnorm(380000),nrow=1000,ncol=380) ) -> > pos> dim(pos)[1] 1000 381> dim(neg)[1] 1000 381 and run the procedure above, and then the prediction step works! How come? What could be wrong with my second (not training) data set? Fredrik -- "Life is like a trumpet - if you don't put anything into it, you don't get anything out of it." [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.