Thanks Jianhong for reporting this.

Changes implemented in IRanges 1.19.27:
- RleList() constructor now has default 'compress=TRUE'.
- seqselect,Vector-method lapply() loop was replaced with direct subset.

New timings:

## generic subset function
fun0 <- function(x) x[500:1]

## GRangesList with RleList as metadata col
grll <- GRanges(seqnames="chr1",
                IRanges(start=1:500, width=2),
                someInfo=rep(RleList("*"), 500))
grr <- split(grll, 1:500)
> microbenchmark(fun0(grr), times=10)
Unit: milliseconds
      expr      min       lq   median      uq      max neval
 fun0(grr) 28.88062 29.31157 30.58494 31.4393 32.26367    10

Median is now 0.031 seconds compared to the previous 1.635.

              > system.time(grr<- grr[500:1])
                 user  system elapsed
                1.622   0.013   1.635


On 08/23/2013 11:17 AM, Michael Lawrence wrote:

    Hi Michael,

    Martin and I have been discussing this. In addition to the fix you
    suggest, what do you think of changing the default to
    compressed=TRUE for the RleList constructor? Rle is the only one of
    the AtomicLists with default FALSE. Was there a reason for this when
    it was first implemented?

I'm guessing Patrick did that because we always used Rles for coverage,
and RleList for per-chromosome coverage. Also, there might be some
overhead in that Rle runs in the unlistData can cross list elements.

About my fix, the only downside would be if the range widths were much
larger than the size of the vector, e.g., a highly compressed Rle,
selected with chromosome-size ranges. Then the as.integer(ir) is big
compared to the data. Otherwise, it's way faster.


        SimpleLists are slow in this situation, basically because the
        seqselect is slow, due to this loop:

                      x <-, lapply(seq_len(length(ir)),
                          start = start(ir)[i], width = width(ir)[i])))

        Am I missing something or could this become a simple

        In the meantime, using CompressedLists is the way to go. So for an
        RleList, you need to pass compress=TRUE to the constructor.

             When I use big set of GrangesList, I found it become very
        slow when
             metadata contain AtomicList. e.g.

              > grll <- GRanges(seqnames="chr1", ranges=IRanges(start=1:500,
             width=2), someInfo=rep(RleList("*"), 500))
              > grr <- split(grll, 1:500)
              > grl <- as.list(grr)
              > system.time(grl<- grl[500:1])
                 user  system elapsed
                    0       0       0
              > system.time(grr<- grr[500:1])
                 user  system elapsed
                1.622   0.013   1.635
              > sessionInfo()
             R Under development (unstable) (2013-07-23 r63392)
             Platform: x86_64-apple-darwin12.4.0 (64-bit)


             attached base packages:
             [1] parallel  stats     graphics  grDevices utils     datasets
               methods   base

             other attached packages:
             [1] GenomicRanges_1.13.36 XVector_0.1.0         IRanges_1.19.24

             loaded via a namespace (and not attached):
             [1] stats4_3.1.0 tools_3.1.0

             Is there any method to improve this?

             Yours sincerely,

             Jianhong Ou

             LRB 670A
             Program in Gene Function and Expression
             364 Plantation Street Worcester,
             MA 01605

