On Mon, Jul 13, 2009 at 9:56 AM, Simon Anders <[email protected]> wrote:
> Hi, > > I've got RNA-Seq data in a wiggle file and exon coordinates in a GFF file. > Reading this in with rtracklayer's import function, I get the wiggle file as > UCSCData object and the GFF file as RangedData object. > > I would now like to get the integral of the coverage for each exon. > > So I extract the RangedData inside the UCSCData by coercion and get > There is no need to perform this coercion... UCSCData extends RangedData. > > cvg.bl1 <- as( import( "igb/10-BL1-coverage.wig" ), "RangedData" ) > > cvg.bl1 > RangedData: 253256 ranges by 1 column on 1 sequence > colnames(1): score > names(1): chr10 > > and have an IRanges object with the exon boundaries here: > > > exons <- import( "igb/10.gff" ) > > exons <- exons[ exons$type == "exon", ] > > ranges(exons)[["10"]] > IRanges instance: > start end width > [1] 83769 83877 109 > [2] 190766 190883 118 > ... ... ... ... > [27083] 135347172 135347681 510 > > Now, I would like to go through each exon interval and sum up the coverage, > i.e., do something like > > aggregate( cvg.bl1, ranges(exons), sum ) > > This, however, does not work, as aggregate seems to be unable to deal with > a RangedData object as first parameter. This here, in contrast, does give a > result > > aggregate( Rle( 1, 1000000000 ), ranges(exons), sum ) > > namely a vector of length 27083. > > Now, I have three questions: > > 1. How can I get the vector referring to chr10 in cvg.bl1? Is there a way > to coerce to Rle or numeric? Supposedly, this only make sense if I subset > the RangedData object to only contain one chromosome. > score(cvg.bl1) or for any arbitrary column: cvg.bl1$score > 2. How can I aggregate over the RangedData object? As it contains only one > chromosome, it should be possible to coerce it to an Rle vector, but there > is no coercion method, and, it should be possible anyway to avoid this. > Just use Views. Like: v <- Views(score(cvg.bl1), ranges(exons)[[1]]) viewSums(v) > 3. In the present case, 'ranges(exons)' contains a CompressedIRangesList > with 1 element, and not a simple IRanges object. If it contains more than > one element, it seems that aggregate calculates the aggregate vector for > each element and just concatenated them. Does this really make sense? > > Cheers > Simon > > > > sessionInfo() > R version 2.10.0 Under development (unstable) (2009-06-26 r48838) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_GB.UTF-8 > [7] LC_PAPER=en_GB.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] IRanges_1.3.33 rtracklayer_1.5.7 RCurl_0.98-1 bitops_1.0-4.1 > > loaded via a namespace (and not attached): > [1] Biobase_2.5.4 Biostrings_2.13.24 BSgenome_1.13.10 tools_2.10.0 > [5] XML_2.5-3 > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
