Hi Vince, This issue was reported here a couple of weeks ago:
https://github.com/Bioconductor/GenomicRanges/issues/11 Internally $<- uses something like: do.call(DataFrame, list(DF1, DF2)) to combine the metadata columns. However in some situations the do.call(DataFrame, list(...)) form is **very** inefficient compared to the more direct DataFrame(...) form: library(S4Vectors) DF1 <- DataFrame(a=Rle(11:1999, 1011:2999), b=5) DF2 <- DataFrame(c=Rle(12:2000, 1011:2999)) system.time(DF12 <- do.call(DataFrame, list(DF1, DF2))) # user system elapsed # 4.476 0.000 4.476 system.time(DF12b <- DataFrame(DF1, DF2)) # user system elapsed # 0.002 0.000 0.001 identical(DF12, DF12b) # [1] TRUE @Michael: Any idea what's going on? Thanks, H. On 10/03/2018 07:01 AM, Vincent Carey wrote:
The following comes up in use of Fdb.InfiniumMethylation.hg19::getPlatform debug: mcols(GR)$channel <- Rle(as.factor(mcols(GR)$channel450)) Browse[3]> system.time(uu <- Rle(as.factor(mcols(GR)$channel450))) user system elapsed 0.020 0.003 0.022 Browse[3]> system.time(mcols(GR)$channel <- Rle(as.factor(mcols(GR)$channel450))) user system elapsed 47.263 0.067 47.373 Browse[3]> GR$channel[1] factor-Rle of length 1 with 1 run Lengths: 1 Values : Both Levels(3): Both Grn Red Browse[3]> system.time(GR$channel <- Rle(as.factor(mcols(GR)$channel450))) user system elapsed 0.058 0.006 0.065 Presumably the mcols()$<- copies/rewrites a lot of data needlessly?
-- Hervé Pagès Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpa...@fredhutch.org Phone: (206) 667-5791 Fax: (206) 667-1319 _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel