[R] bootstrap sample for clustered data

Liu, Lei Sun, 16 Sep 2018 21:00:46 -0700

Hi there,

I posted this message before but there may be some confusion in my previous 
post. So here is a clearer version:


I'd like to do a bootstrap sampling for clustered data. Then I will run some 
complicated models (say mixed effects models) on the bootstrapped sample. Here 
id is the cluster. Note different clusters have different number of subjects, 
e.g., id 2 has 2 observations, id 3 has 3 observations.

id=c(1, 1, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5)
y=c(.5, .6, .4, .3, .4, 1, .9, 1, .5, 2, 2.2, 3)
x=c(0, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1 )

xx=data.frame(id, x, y)

boot.cluster <- function(x, id){

  boot.id <- sample(unique(id), replace=T)
  out <- lapply(boot.id, function(i) x[id%in%i,])

  return( do.call("rbind",out) )

}

boot.xx=boot.cluster(xx, xx$id)

Here is the generated boot.xx dataset:

   id x y
   3 0 0.4
   3 0 1.0
   3 0 0.9
   1 0 0.5
   1 0 0.6
   5 1 2.2
   5 1 3.0
   2 1 0.4
   2 1 0.3
   1 0 0.5
   1 0 0.6

You can see that some clusters (ids) appears multiple times (e.g., id 1 appears 
in two places - 4 rows), since bootstrap does a sample with replacement, we 
could have the same cluster multiple times. Thus, we cannot do a mixed effects 
model using this data, as we should assume all the clusters are different in 
this new data. Instead, I will reorganize the data as below (id is reordered 
from the above boot.xx data). This is the step I need help:

  id x  y
   1 0 0.4
   1 0 1.0
   1 0 0.9
   2 0 0.5
   2 0 0.6
   3 1 2.2
   3 1 3.0
   4 1 0.4
   4 1 0.3
   5 0 0.5
   5 0 0.6

Can someone help me with it? Thanks!

Lei Liu
Professor of Biostatistics
Washington University in St. Louis


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] bootstrap sample for clustered data

Reply via email to