Thanks Jianhong for reporting this. Changes implemented in IRanges 1.19.27: - RleList() constructor now has default 'compress=TRUE'. - seqselect,Vector-method lapply() loop was replaced with direct subset.
New timings: ## generic subset function fun0 <- function(x) x[500:1] ## GRangesList with RleList as metadata col grll <- GRanges(seqnames="chr1", IRanges(start=1:500, width=2), someInfo=rep(RleList("*"), 500)) grr <- split(grll, 1:500) > microbenchmark(fun0(grr), times=10) Unit: milliseconds expr min lq median uq max neval fun0(grr) 28.88062 29.31157 30.58494 31.4393 32.26367 10 Median is now 0.031 seconds compared to the previous 1.635.
> system.time(grr<- grr[500:1]) user system elapsed 1.622 0.013 1.635
Valerie On 08/23/2013 11:17 AM, Michael Lawrence wrote:
On Fri, Aug 23, 2013 at 8:41 AM, Valerie Obenchain <voben...@fhcrc.org <mailto:voben...@fhcrc.org>> wrote: Hi Michael, Martin and I have been discussing this. In addition to the fix you suggest, what do you think of changing the default to compressed=TRUE for the RleList constructor? Rle is the only one of the AtomicLists with default FALSE. Was there a reason for this when it was first implemented? I'm guessing Patrick did that because we always used Rles for coverage, and RleList for per-chromosome coverage. Also, there might be some overhead in that Rle runs in the unlistData can cross list elements. About my fix, the only downside would be if the range widths were much larger than the size of the vector, e.g., a highly compressed Rle, selected with chromosome-size ranges. Then the as.integer(ir) is big compared to the data. Otherwise, it's way faster. Val On 08/22/2013 07:34 PM, Maintainer wrote: Hi, SimpleLists are slow in this situation, basically because the underlying seqselect is slow, due to this loop: x <- do.call(c, lapply(seq_len(length(ir)), function(i) window(x, start = start(ir)[i], width = width(ir)[i]))) Am I missing something or could this become a simple x[as.integer(ir)]? In the meantime, using CompressedLists is the way to go. So for an RleList, you need to pass compress=TRUE to the constructor. On Wed, Aug 21, 2013 at 8:30 AM, Ou, Jianhong <jianhong...@umassmed.edu <mailto:jianhong...@umassmed.edu> <mailto:Jianhong.Ou@umassmed.__edu <mailto:jianhong...@umassmed.edu>>> wrote: Hi, When I use big set of GrangesList, I found it become very slow when metadata contain AtomicList. e.g. > grll <- GRanges(seqnames="chr1", ranges=IRanges(start=1:500, width=2), someInfo=rep(RleList("*"), 500)) > grr <- split(grll, 1:500) > grl <- as.list(grr) > system.time(grl<- grl[500:1]) user system elapsed 0 0 0 > system.time(grr<- grr[500:1]) user system elapsed 1.622 0.013 1.635 > grll <- GRanges(seqnames="chr1", ranges=IRanges(start=1:500, width=2)) > grr <- split(grll, 1:500) > grl <- as.list(grr) > system.time(grl<- grl[500:1]) user system elapsed 0 0 0 > system.time(grr<- grr[500:1]) user system elapsed 0.029 0.001 0.030 > sessionInfo() R Under development (unstable) (2013-07-23 r63392) Platform: x86_64-apple-darwin12.4.0 (64-bit) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.__UTF-8/C/en_US.UTF-8/en_US.UTF-__8 attached base packages: [1] parallel stats graphics grDevices utils datasets methods base other attached packages: [1] GenomicRanges_1.13.36 XVector_0.1.0 IRanges_1.19.24 BiocGenerics_0.7.3 loaded via a namespace (and not attached): [1] stats4_3.1.0 tools_3.1.0 Is there any method to improve this? Yours sincerely, Jianhong Ou LRB 670A Program in Gene Function and Expression 364 Plantation Street Worcester, MA 01605 [[alternative HTML version deleted]] _________________________________________________ Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> <mailto:Bioc-devel@r-project.__org <mailto:Bioc-devel@r-project.org>> mailing list https://stat.ethz.ch/mailman/__listinfo/bioc-devel <https://stat.ethz.ch/mailman/listinfo/bioc-devel> ____________________________________________________________________________ devteam-bioc mailing list To unsubscribe from this mailing list send a blank email to devteam-bioc-leave@lists.__fhcrc.org <mailto:devteam-bioc-le...@lists.fhcrc.org> You can also unsubscribe or change your personal options at https://lists.fhcrc.org/__mailman/listinfo/devteam-bioc <https://lists.fhcrc.org/mailman/listinfo/devteam-bioc>
_______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel