Re: [Bioc-devel] SummarizedExperiment with alternate back end

Ryan Fri, 18 Sep 2015 18:11:51 -0700

In the dev version, SummarizedExperiment has been split intoRangedSummarizedExperiment (equivalent to the currentSummarizedExperiement, with rowRanges) and SummarizedExperiment (kind oflike eSet, no rowRanges). Given that eSet objects also support multipleassayData elements, I believe the new SummarizedExperiment is prettyclose to being eSet with different method names. In fact, I wonder ifeSet could/should be reimplemented as a subclass of the newSummarizedExperiment class.


On 9/18/15 5:36 PM, Kasper Daniel Hansen wrote:

Interesting, thanks for the pointer.


In light of the existing (and future) work on this, may I suggest an eSet
like class, but build using the technologies in SummarizedExperiment.  Ie.
a SummarizedExperiment without the rowRanges.  I would very much like this
for modern work using eSet like containers.  Not everything has ranges.

Vince: I am not claiming that it is easy to work with; we have pains as
well.  But am I missing something or is the assay matrix only 2.3Gb?

Best,
Kasper

On Fri, Sep 18, 2015 at 6:28 PM, Peter Haverty <haverty.pe...@gene.com>
wrote:

Yes, bigmemoryExtras::BigMatrix and genoset::RleDataFrame() are good tricks
for reducing the size of your eSets and SummarizedExperiments.  Both object
types can go into assayData or assays. In fact, that's what they were
designed for.

At Genentech, we use these for our 2.5e6 x 1e3 rectangular data from
Illumina SNP arrays.  We typically have ~6 such rectangular objects in one
eSet.  With a mix of BigMatrix object for point estimates and RleDataFrames
for segmented data, readRDS times are quite reasonable.


Pete

____________________
Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com

On Fri, Sep 18, 2015 at 1:56 PM, Tim Triche, Jr. <tim.tri...@gmail.com>
wrote:

bigmemoryExtras (Peter Haverty's extensions to bigMemory/bigMatrix) can

be

handy for this, as it works well as a backend, especially if you go about
splitting by chromosome as for CNV segmentation, DMR finding, etc.   It's
not as seamless as one might like, but it's the closest thing I've found.

SciDb tries to implement a similar API, but for a distributed version of
this where the data itself is in a columnar database and served on

demand.

I tried getting that up and running as a SummarizedExperiment backend,

but

did not succeed.  I have previously shoveled all of the TCGA 450k data

into

one 7,000+ column bigMatrix which serializes to about 14GB on disk.

If you have any replicates in your 700+ samples, it's a good idea to keep
their SNP calls in metadata(yourSE), although if you change names it

needs

to propagate into the dependent metadata.  This is why I started

monkeying

around with linkedExperiments where those mappings are enforced; it's
becoming more of an issue with the TARGET pediatric AML study, where

there

are numerous diagnosis-remission-relapse trios whose identity I wish to
verify periodically.  The SNPs on the 450k array are great for this
purpose, but minfi doesn't really have a slot for them per se, so live in
metadata().


--t

On Fri, Sep 18, 2015 at 1:29 PM, Vincent Carey <

st...@channing.harvard.edu

wrote:

i am dealing with ~700 450k arrays

they are derived from one study, so it makes sense to think of

them holistically.

both the load time and the memory consumption are not satisfactory.

has anyone worked on an object type that implements the rangedSE API

but

has

the assay data out of memory?

unix.time(load("wbmse.rda"))

    user  system elapsed

  30.131   2.396  61.036

object.size(wbmse)

124031032 bytes

dim(wbmse)

[1] 485577    690

object.size(assays(wbmse))

2680430992 bytes

         [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

         [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

         [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Re: [Bioc-devel] SummarizedExperiment with alternate back end

Reply via email to