On Mon, 11 Oct 2010, Michael Lawrence wrote:

I have a pass-by-reference GRanges that I have been playing around with for
interactive graphics. I'll stick the package into svn soon.


Thanks. We'll look forward to trying that out.

Also I'd point out that rtracklayer has bigWig support, which lets you
access smaller portions of your data fairly efficiently.


Will check this out, too.

Michael

On Mon, Oct 11, 2010 at 10:51 AM, Kasper Daniel Hansen <
[email protected]> wrote:

When you are saying you're bumping up against the limits of RAM, how
much do you have?  It might be nice to have a GRanges that has
pass-by-reference semantics, but in general I find my GRanges to be
smaller (at most a couple of GB).  My "problems" are with the
associated data.


Likewise. The GRanges aren't prohibitive, but every bit helps. Also, if we want to hand this software to others, it would be nice to not have to require others to mimic our hardware.

Chuck


Kasper

On Mon, Oct 11, 2010 at 11:59 AM, Charles C. Berry <[email protected]>
wrote:
On Mon, 11 Oct 2010, Steve Lianoglou wrote:

Hi Chuck,

On Mon, Oct 11, 2010 at 11:24 AM, Charles C. Berry <
[email protected]>
wrote:

We are liking the idioms that go with GenomicRanges and RangedData
Objects
(follow, precede, findOverlaps, etc), but we are bumping up against
memory
demands of loading very large objects.

Is there now or will there soon be a cached version of these that will
lessen our memory requirements?

If not, is there a cookbook as to how to create and save cached
versions
of
these objects.

Or maybe a place to look in the bioConductor codebase to get some ideas
of
how to go about constructing cached versions of these classes?

I'm not sure what you mean by caching -- do you want them serialized
to disk and you read off parts when you need them, or?

That's basically the idea. I looked at how BSGenome handles FASTA, and it
allows you to read in one chromosome, make apparent copies that do not
physically copy the object unless it is modified, and then clean up
afterwards without much of the work under the hood.



Also -- I typically split my data and processing to work on a
chromosome by chromosome basis -- even though the GenomicRanges
infrastructure allows you to keep ranges spanning multiple chromosomes
in one object. Although it's a bit more book keeping code on my part,
I find that doing so helps to keep my RAM requirements down a bit.
Perhaps that obvious/marginal suggestion might help for the time
being?

Thanks. We have bits and pieces of a pipeline that do that. But we are
about
to refactor that pipeline, so the hope is to make something that is
fairly
clean, will endure, and handle the large objects that new sequencing
technologies are likely to throw at us.

Chuck

-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: 
http://cbio.mskcc.org/~lianos/contact<http://cbio.mskcc.org/%7Elianos/contact>


Charles C. Berry                            (858) 534-2098
                                           Dept of Family/Preventive
Medicine
E mailto:[email protected]               UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego
92093-0901


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing



Charles C. Berry                            (858) 534-2098
                                            Dept of Family/Preventive Medicine
E mailto:[email protected]               UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/  La Jolla, San Diego 92093-0901

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to