On Fri, Apr 2, 2010 at 11:21 AM, Michael Lawrence <[email protected] > wrote:
> > > On Fri, Apr 2, 2010 at 7:55 AM, Vincent Carey > <[email protected]>wrote: > >> To get a bit more concrete regarding these notions, the leeBamViews >> package is in the experimental data archive, a VERY rudimentary illustration >> of a workflow rooted in BAM archive files through region specification and >> read counting. For the very latest checkin, after running >> >> example(bs1) >> >> we have an ad hoc tabulation of read counts: >> >> bs1> tabulateReads(bs1, "+") >> intv1 intv2 >> start 861250 863000 >> end 862750 864000 >> isowt.5 3673 2692 >> isowt.6 3770 2650 >> rlp.5 1532 1045 >> rlp.6 1567 1139 >> ssr.1 4304 3052 >> ssr.2 4627 3381 >> xrn.1 2841 1693 >> xrn.2 3477 2197 >> >> or, by setting as.GRanges, a GRanges-based representation >> >> > tabulateReads(bs1, "+", as.GRanges=TRUE) >> GRanges with 2 ranges and 9 elementMetadata values >> seqnames ranges strand | name isowt.5 isowt.6 >> <Rle> <IRanges> <Rle> | <character> <integer> <integer> >> [1] Scchr13 [861250, 862750] + | intv1 3673 3770 >> [2] Scchr13 [863000, 864000] + | intv2 2692 2650 >> rlp.5 rlp.6 ssr.1 ssr.2 xrn.1 xrn.2 >> <integer> <integer> <integer> <integer> <integer> <integer> >> [1] 1532 1567 4304 4627 2841 3477 >> [2] 1045 1139 3052 3381 1693 2197 >> >> seqlengths >> Scchr13 >> NA >> > tabulateReads(bs1, "+", as.GRanges=TRUE) -> OO >> > metadata(OO) >> list() >> >> It seems that we would want more structure in a metadata component to get >> closer to the values of ExpressionSet discipline. We would also want some >> accommodation of this kind of representation in the downstream packages like >> edgeR, DEseq. >> >> > The actual 'metadata' slot was meant to be general, in order to accommodate > all needs. If a particular type of data requires a certain structure, then > additional formal classes may be necessary. For example, gene expression > RNA-seq may want a featureData equivalent annotating each transcript, > whereas with ChIP-seq data, that sort of structure would make less sense, > short of some additional assumptions. > I agree completely. Our task is to think/experiment about how to suitably specialize these structures for most effective downstream use. Reuse by multiple downstream toolchains would be great. > Michael > > > sessionInfo() >> R version 2.11.0 Under development (unstable) (2010-03-24 r51388) >> x86_64-apple-darwin10.2.0 >> >> locale: >> [1] C >> >> attached base packages: >> [1] stats graphics grDevices datasets tools utils methods >> [8] base >> >> other attached packages: >> [1] leeBamViews_0.99.3 BSgenome_1.15.18 Rsamtools_0.2.1 >> [4] Biostrings_2.15.25 GenomicRanges_0.1.3 IRanges_1.5.74 >> [7] Biobase_2.7.5 weaver_1.13.0 codetools_0.2-2 >> [10] digest_0.4.1 >> >> >> On Thu, Apr 1, 2010 at 10:15 AM, Martin Morgan <[email protected]>wrote: >> >>> On 03/31/2010 04:06 AM, Michael Lawrence wrote: >>> > On Wed, Mar 31, 2010 at 3:55 AM, David Rossell < >>> > [email protected]> wrote: >>> > >>> >> Following a recent thread, I also have found convenient to store >>> nextgen >>> >> data as RangedData instead of ShortRead objects. They require far less >>> >> memory and make feasible working with several samples at the same time >>> (in >>> >> my 8Gb RAM desktop I can load 2 ShortRead objects at the most, with >>> >> RangedData I haven't struck the upper limit yet). >>> >> >>> >> I am thinking about taking this idea a step forward: RangedDataList >>> allows >>> >> storing info from several samples (e.g. IP and control) in a single >>> object. >>> >> The only problem is RangedDataList does not store information about >>> the >>> >> samples, e.g. the phenoData we're used to in ExpressionSet objects. My >>> idea >>> >> is to define something like a "SequenceSet" class, which would contain >>> a >>> >> RangedDataList with the ranges, a phenoData with sample information, >>> and >>> >> possibly also information about the experiment (e.g. with the MIAME >>> analog >>> >> for sequencing, MIASEQE). >>> >> >>> >> The thing is I don't want to re-invent the wheel. I haven't seen that >>> this >>> >> is implemented yet, but is someone working on it? Any criticism/ >>> ideas? >>> >> >>> >> >>> > RangedDataList already supports this. See the 'elementMetadata' and >>> > 'metadata' slots in the Sequence class. >>> >>> Hi David et al., >>> >>> I've also found the elementMetadata slot excellent for this purpose. >>> The ShortRead data objects retain sequence and quality information, this >>> information is often not needed after a certain point in the analysis. >>> >>> Wanted to point to the GenomicRanges package in Bioc-devel, which has a >>> GRanges class that is more fastidious about strand information (maybe a >>> plus?) and conforms more to an 'I am a rectangular data structure' world >>> view. Also the GappedAlignments class for efficiently representing large >>> numbers of reads. >>> >>> Martin >>> >>> > >>> > Michael >>> > >>> > >>> > >>> >> Best, >>> >> >>> >> David >>> >> >>> >> -- >>> >> David Rossell, PhD >>> >> Manager, Bioinformatics and Biostatistics unit >>> >> IRB Barcelona >>> >> Tel (+34) 93 402 0217 >>> >> Fax (+34) 93 402 0257 >>> >> http://www.irbbarcelona.org/bioinformatics >>> >> >>> >> [[alternative HTML version deleted]] >>> >> >>> >> _______________________________________________ >>> >> Bioc-sig-sequencing mailing list >>> >> [email protected] >>> >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >>> >> >>> > >>> > [[alternative HTML version deleted]] >>> > >>> > _______________________________________________ >>> > Bioc-sig-sequencing mailing list >>> > [email protected] >>> > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >>> >>> >>> -- >>> Martin Morgan >>> Computational Biology / Fred Hutchinson Cancer Research Center >>> 1100 Fairview Ave. N. >>> PO Box 19024 Seattle, WA 98109 >>> >>> Location: Arnold Building M1 B861 >>> Phone: (206) 667-2793 >>> >>> _______________________________________________ >>> Bioc-sig-sequencing mailing list >>> [email protected] >>> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >>> >> >> > [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
