Re: [Bioc-devel] [devteam-bioc] Very slow when operate GRangesList

Valerie Obenchain Tue, 27 Aug 2013 13:50:41 -0700

Thanks Jianhong for reporting this.

Changes implemented in IRanges 1.19.27:
- RleList() constructor now has default 'compress=TRUE'.
- seqselect,Vector-method lapply() loop was replaced with direct subset.


New timings:

## generic subset function
fun0 <- function(x) x[500:1]

## GRangesList with RleList as metadata col
grll <- GRanges(seqnames="chr1",
                IRanges(start=1:500, width=2),
                someInfo=rep(RleList("*"), 500))
grr <- split(grll, 1:500)
> microbenchmark(fun0(grr), times=10)
Unit: milliseconds
      expr      min       lq   median      uq      max neval
 fun0(grr) 28.88062 29.31157 30.58494 31.4393 32.26367    10

Median is now 0.031 seconds compared to the previous 1.635.

              > system.time(grr<- grr[500:1])
                 user  system elapsed
                1.622   0.013   1.635




Valerie


On 08/23/2013 11:17 AM, Michael Lawrence wrote:




On Fri, Aug 23, 2013 at 8:41 AM, Valerie Obenchain <voben...@fhcrc.org
<mailto:voben...@fhcrc.org>> wrote:

    Hi Michael,

    Martin and I have been discussing this. In addition to the fix you
    suggest, what do you think of changing the default to
    compressed=TRUE for the RleList constructor? Rle is the only one of
    the AtomicLists with default FALSE. Was there a reason for this when
    it was first implemented?


I'm guessing Patrick did that because we always used Rles for coverage,
and RleList for per-chromosome coverage. Also, there might be some
overhead in that Rle runs in the unlistData can cross list elements.

About my fix, the only downside would be if the range widths were much
larger than the size of the vector, e.g., a highly compressed Rle,
selected with chromosome-size ranges. Then the as.integer(ir) is big
compared to the data. Otherwise, it's way faster.


    Val




    On 08/22/2013 07:34 PM, Maintainer wrote:

        Hi,

        SimpleLists are slow in this situation, basically because the
        underlying
        seqselect is slow, due to this loop:

                      x <- do.call(c, lapply(seq_len(length(ir)),
        function(i)
        window(x,
                          start = start(ir)[i], width = width(ir)[i])))

        Am I missing something or could this become a simple
        x[as.integer(ir)]?

        In the meantime, using CompressedLists is the way to go. So for an
        RleList, you need to pass compress=TRUE to the constructor.


        On Wed, Aug 21, 2013 at 8:30 AM, Ou, Jianhong
        <jianhong...@umassmed.edu <mailto:jianhong...@umassmed.edu>
        <mailto:Jianhong.Ou@umassmed.__edu
        <mailto:jianhong...@umassmed.edu>>> wrote:

             Hi,

             When I use big set of GrangesList, I found it become very
        slow when
             metadata contain AtomicList. e.g.

              > grll <- GRanges(seqnames="chr1", ranges=IRanges(start=1:500,
             width=2), someInfo=rep(RleList("*"), 500))
              > grr <- split(grll, 1:500)
              > grl <- as.list(grr)
              > system.time(grl<- grl[500:1])
                 user  system elapsed
                    0       0       0
              > system.time(grr<- grr[500:1])
                 user  system elapsed
                1.622   0.013   1.635
              > grll <- GRanges(seqnames="chr1", ranges=IRanges(start=1:500,
             width=2))
              > grr <- split(grll, 1:500)
              > grl <- as.list(grr)
              > system.time(grl<- grl[500:1])
                 user  system elapsed
                    0       0       0
              > system.time(grr<- grr[500:1])
                 user  system elapsed
                0.029   0.001   0.030
              > sessionInfo()
             R Under development (unstable) (2013-07-23 r63392)
             Platform: x86_64-apple-darwin12.4.0 (64-bit)

             locale:
             [1]
        en_US.UTF-8/en_US.UTF-8/en_US.__UTF-8/C/en_US.UTF-8/en_US.UTF-__8

             attached base packages:
             [1] parallel  stats     graphics  grDevices utils     datasets
               methods   base

             other attached packages:
             [1] GenomicRanges_1.13.36 XVector_0.1.0         IRanges_1.19.24
                BiocGenerics_0.7.3

             loaded via a namespace (and not attached):
             [1] stats4_3.1.0 tools_3.1.0

             Is there any method to improve this?

             Yours sincerely,

             Jianhong Ou

             LRB 670A
             Program in Gene Function and Expression
             364 Plantation Street Worcester,
             MA 01605

                      [[alternative HTML version deleted]]

             _________________________________________________
        Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
        <mailto:Bioc-devel@r-project.__org
        <mailto:Bioc-devel@r-project.org>> mailing list
        https://stat.ethz.ch/mailman/__listinfo/bioc-devel
        <https://stat.ethz.ch/mailman/listinfo/bioc-devel>




        
____________________________________________________________________________
        devteam-bioc mailing list
        To unsubscribe from this mailing list send a blank email to
        devteam-bioc-leave@lists.__fhcrc.org
        <mailto:devteam-bioc-le...@lists.fhcrc.org>
        You can also unsubscribe or change your personal options at
        https://lists.fhcrc.org/__mailman/listinfo/devteam-bioc
        <https://lists.fhcrc.org/mailman/listinfo/devteam-bioc>


_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] [devteam-bioc] Very slow when operate GRangesList

Reply via email to