dear Bioc-devels, >From a conversation on Twitter [1], I've been wondering if there has been any work to implement block bootstrap for GRanges.
I guess that's my main question, and then the rest is details and ideas in case it hasn't already been done. The block bootstrap helps to create perhaps more realistic null data, in way that deals with clumping of features. You take many random, potentially overlapping blocks from the original data (ranges) to fill up a new set of data (ranges). There's a 2011 paper from Peter Bickel about this topic [2], which had some python software associated with it. Aside from the statistical considerations, one would want to do each bootstrap quickly so you can make many bootstrap samples, and so having vectorized code would be an obvious desirable thing. I was thinking about how the Views infrastructure is kind of what one would want, but it operates on Rle's if I remember correctly, and the desired output here would be ranges. Below is kind of getting what I want, as a hack I'd then convert the Rle's back to ranges and concatenate them if I wanted 5 x 40bp = 200 bp of bootstrap data. > ir <- IRanges(c(51,61,71,111,121,131,200),width=5) # make some clumpy data > Views(coverage(ir), start=round(runif(5,1,150)), width=40) # e.g. 5 random 40 > bp blocks Views on a 204-length Rle subject views: start end width [1] 97 136 40 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 ...] [2] 50 89 40 [0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 ...] [3] 18 57 40 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 ...] [4] 8 47 40 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ...] [5] 97 136 40 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 1 1 ...] Another option would be to convert to time-series data, then use an existing package like 'boot' then convert back to ranges. [1] https://twitter.com/mikelove/status/1027296115346092032 [2] https://arxiv.org/abs/1101.0947 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel