Hi there, I posted this message before but there may be some confusion in my previous post. So here is a clearer version:
I'd like to do a bootstrap sampling for clustered data. Then I will run some complicated models (say mixed effects models) on the bootstrapped sample. Here id is the cluster. Note different clusters have different number of subjects, e.g., id 2 has 2 observations, id 3 has 3 observations. id=c(1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5) y=c(.5, .6, .4, .3, .4, 1, .9, 1, .5, 2, 2.2, 3) x=c(0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1 ) xx=data.frame(id, x, y) boot.cluster <- function(x, id){ boot.id <- sample(unique(id), replace=T) out <- lapply(boot.id, function(i) x[id%in%i,]) return( do.call("rbind",out) ) } boot.xx=boot.cluster(xx, xx$id) Here is the generated boot.xx dataset: id x y 3 0 0.4 3 0 1.0 3 0 0.9 1 0 0.5 1 0 0.6 5 1 2.2 5 1 3.0 2 1 0.4 2 1 0.3 1 0 0.5 1 0 0.6 You can see that some clusters (ids) appears multiple times (e.g., id 1 appears in two places - 4 rows), since bootstrap does a sample with replacement, we could have the same cluster multiple times. Thus, we cannot do a mixed effects model using this data, as we should assume all the clusters are different in this new data. Instead, I will reorganize the data as below (id is reordered from the above boot.xx data). This is the step I need help: id x y 1 0 0.4 1 0 1.0 1 0 0.9 2 0 0.5 2 0 0.6 3 1 2.2 3 1 3.0 4 1 0.4 4 1 0.3 5 0 0.5 5 0 0.6 Can someone help me with it? Thanks! Lei Liu Professor of Biostatistics Washington University in St. Louis [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.