On Mon, Jul 13, 2009 at 9:56 AM, Simon Anders <[email protected]> wrote:

> Hi,
>
> I've got RNA-Seq data in a wiggle file and exon coordinates in a GFF file.
> Reading this in with rtracklayer's import function, I get the wiggle file as
> UCSCData object and the GFF file as RangedData object.
>
> I would now like to get the integral of the coverage for each exon.
>
> So I extract the RangedData inside the UCSCData by coercion and get
>

There is no need to perform this coercion... UCSCData extends RangedData.


>  > cvg.bl1 <- as( import( "igb/10-BL1-coverage.wig" ), "RangedData" )
>  > cvg.bl1
>  RangedData: 253256 ranges by 1 column on 1 sequence
>  colnames(1): score
>  names(1): chr10
>
> and have an IRanges object with the exon boundaries here:
>
>  > exons <- import( "igb/10.gff" )
>  > exons <- exons[ exons$type == "exon", ]
>  > ranges(exons)[["10"]]
>  IRanges instance:
>              start       end width
>  [1]         83769     83877   109
>  [2]        190766    190883   118
>  ...           ...       ...   ...
>  [27083] 135347172 135347681   510
>
> Now, I would like to go through each exon interval and sum up the coverage,
> i.e., do something like
>
>  aggregate( cvg.bl1, ranges(exons), sum )
>
> This, however, does not work, as aggregate seems to be unable to deal with
> a RangedData object as first parameter. This here, in contrast, does give a
> result
>
>  aggregate( Rle( 1, 1000000000 ), ranges(exons), sum )
>
> namely a vector of length 27083.
>
> Now, I have three questions:
>
> 1. How can I get the vector referring to chr10 in cvg.bl1? Is there a way
> to coerce to Rle or numeric? Supposedly, this only make sense if I subset
> the RangedData object to only contain one chromosome.
>

score(cvg.bl1)
or for any arbitrary column:
cvg.bl1$score


> 2. How can I aggregate over the RangedData object? As it contains only one
> chromosome, it should be possible to coerce it to an Rle vector, but there
> is no coercion method, and, it should be possible anyway to avoid this.
>

Just use Views.
Like:
v <- Views(score(cvg.bl1), ranges(exons)[[1]])
viewSums(v)


> 3. In the present case, 'ranges(exons)' contains a CompressedIRangesList
> with 1 element, and not a simple IRanges object. If it contains more than
> one element, it seems that aggregate calculates the aggregate vector for
> each element and just concatenated them. Does this really make sense?
>
> Cheers
>  Simon
>
>
> > sessionInfo()
> R version 2.10.0 Under development (unstable) (2009-06-26 r48838)
> x86_64-unknown-linux-gnu
>
> locale:
>  [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_GB.UTF-8
>  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] IRanges_1.3.33    rtracklayer_1.5.7 RCurl_0.98-1      bitops_1.0-4.1
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.5.4      Biostrings_2.13.24 BSgenome_1.13.10   tools_2.10.0
> [5] XML_2.5-3
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to