When you are saying you're bumping up against the limits of RAM, how much do you have? It might be nice to have a GRanges that has pass-by-reference semantics, but in general I find my GRanges to be smaller (at most a couple of GB). My "problems" are with the associated data.
Kasper On Mon, Oct 11, 2010 at 11:59 AM, Charles C. Berry <[email protected]> wrote: > On Mon, 11 Oct 2010, Steve Lianoglou wrote: > >> Hi Chuck, >> >> On Mon, Oct 11, 2010 at 11:24 AM, Charles C. Berry <[email protected]> >> wrote: >>> >>> We are liking the idioms that go with GenomicRanges and RangedData >>> Objects >>> (follow, precede, findOverlaps, etc), but we are bumping up against >>> memory >>> demands of loading very large objects. >>> >>> Is there now or will there soon be a cached version of these that will >>> lessen our memory requirements? >>> >>> If not, is there a cookbook as to how to create and save cached versions >>> of >>> these objects. >>> >>> Or maybe a place to look in the bioConductor codebase to get some ideas >>> of >>> how to go about constructing cached versions of these classes? >> >> I'm not sure what you mean by caching -- do you want them serialized >> to disk and you read off parts when you need them, or? > > That's basically the idea. I looked at how BSGenome handles FASTA, and it > allows you to read in one chromosome, make apparent copies that do not > physically copy the object unless it is modified, and then clean up > afterwards without much of the work under the hood. > > >> >> Also -- I typically split my data and processing to work on a >> chromosome by chromosome basis -- even though the GenomicRanges >> infrastructure allows you to keep ranges spanning multiple chromosomes >> in one object. Although it's a bit more book keeping code on my part, >> I find that doing so helps to keep my RAM requirements down a bit. >> Perhaps that obvious/marginal suggestion might help for the time >> being? > > Thanks. We have bits and pieces of a pipeline that do that. But we are about > to refactor that pipeline, so the hope is to make something that is fairly > clean, will endure, and handle the large objects that new sequencing > technologies are likely to throw at us. > > Chuck >> >> -steve >> >> -- >> Steve Lianoglou >> Graduate Student: Computational Systems Biology >> | Memorial Sloan-Kettering Cancer Center >> | Weill Medical College of Cornell University >> Contact Info: http://cbio.mskcc.org/~lianos/contact >> > > Charles C. Berry (858) 534-2098 > Dept of Family/Preventive > Medicine > E mailto:[email protected] UC San Diego > http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 > > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
