Hi Bioc-devel, As some of you are aware, rtracklayer::import() has long provided access to import BigWig files. Those files can be shared on servers and accessed remotely thanks to all the effort from many of you in building and maintaining rtracklayer.
>From my side, derfinder::loadCoverage() relies on rtracklayer::import.bw(), and recount::expressed_regions() + recount::coverage_matrix() use derfinder::loadCoverage(). recountWorkflow showcases those recount functions on larger datasets. brainflowprobes by Amanda Price, Nina Rajpurohit and others also ends up relying on rtracklayer::import.bw() through these functions. At https://github.com/lawremi/rtracklayer/issues/83 I initially reported some issues once our recount2/3 data host changed, but previously Brian Schilder also reported that one could no longer read remote files https://github.com/lawremi/rtracklayer/issues/73. https://github.com/lawremi/rtracklayer/issues/63 and/or https://github.com/lawremi/rtracklayer/issues/65 might have been related. Yesterday I updated https://github.com/lawremi/rtracklayer/issues/83#issuecomment-2121313270 with a comment showing some small reproducible code, and that the workaround of downloading the data first, then using rtracklayer::import() on the local data does work. However, this workaround does involve a lot of, hmm, wasteful data transfer. On the recount vignette at some point I access just chrY of a bigWig file that is about 1300 MB. On the recountWorkflow vignette I do something similar for a 7GB bigWig file. Previously accessing just chrY on these files was a small data transfer. On recountWorkflow version 1.29.2 https://github.com/LieberInstitute/recountWorkflow, I've included pre-computed results (~2 MB) to avoid downloading tons of data, though the vignette code shows how to actually fully reproduce the results if you don't mind downloading those large files. I also implemented some workarounds on recount, though I haven't yet gone the full route of including pre-computed results. I have yet to try implementing a workaround for brainflowprobes. My understanding is that rtracklayer's root issues are elsewhere and changes in dependencies rtracklayer has likely created these problems. These problems are not always in the control of rtracklayer authors to resolve, and also create an unexpected burden on them. If one considers alternatives to rtracklayer, I see that there's a new package https://github.com/PoisonAlien/trackplot/tree/master that uses bwtool (a system dependency), and older alternative https://github.com/andrelmartins/bigWig that hasn't had updates in 4 years, and a CRAN package (https://cran.r-project.org/web/packages/wig/readme/README.html) that recommends using rtracklayer for larger files. I guess that I could also try using megadepth https://research.libd.org/megadepth/, though derfinder::loadCoverage uses rtracklayer::import(as = "RleList") for efficiency https://github.com/lcolladotor/derfinder/blob/f9cd986e0c1b9ea6551d0d8d2077d4501216a661/R/loadCoverage.R#L401 and lots of functions in that package were built for that structure (RleList objects). I likely missed other alternatives. My current line of thought is to keep implementing workarounds using local data (sometimes with pre-computed results) for recount, recountWorkflow, and brainflowprobes (derfinder only has tests with local bigWig files) without really altering the internals of those packages. That is, assume that the remote BigWig file access via rtracklayer will indefinitely be suspended, though it could be supported again at some point and when it does, those packages will work again with remote BigWig files as if nothing ever happened. But I wanted to check in if this is what others who use BigWig files are thinking of doing. Thanks! Best, Leo Leonardo Collado Torres, Ph. D. Investigator, LIEBER INSTITUTE for BRAIN DEVELOPMENT Assistant Professor, Department of Biostatistics Johns Hopkins Bloomberg School of Public Health 855 N. Wolfe St., Room 382 Baltimore, MD 21205 lcolladotor.github.io lcollado...@gmail.com _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel