Hi,

I've got RNA-Seq data in a wiggle file and exon coordinates in a GFF file. Reading this in with rtracklayer's import function, I get the wiggle file as UCSCData object and the GFF file as RangedData object.

I would now like to get the integral of the coverage for each exon.

So I extract the RangedData inside the UCSCData by coercion and get

  > cvg.bl1 <- as( import( "igb/10-BL1-coverage.wig" ), "RangedData" )
  > cvg.bl1
  RangedData: 253256 ranges by 1 column on 1 sequence
  colnames(1): score
  names(1): chr10

and have an IRanges object with the exon boundaries here:

  > exons <- import( "igb/10.gff" )
  > exons <- exons[ exons$type == "exon", ]
  > ranges(exons)[["10"]]
  IRanges instance:
              start       end width
  [1]         83769     83877   109
  [2]        190766    190883   118
  ...           ...       ...   ...
  [27083] 135347172 135347681   510

Now, I would like to go through each exon interval and sum up the coverage, i.e., do something like

  aggregate( cvg.bl1, ranges(exons), sum )

This, however, does not work, as aggregate seems to be unable to deal with a RangedData object as first parameter. This here, in contrast, does give a result

  aggregate( Rle( 1, 1000000000 ), ranges(exons), sum )

namely a vector of length 27083.

Now, I have three questions:

1. How can I get the vector referring to chr10 in cvg.bl1? Is there a way to coerce to Rle or numeric? Supposedly, this only make sense if I subset the RangedData object to only contain one chromosome.

2. How can I aggregate over the RangedData object? As it contains only one chromosome, it should be possible to coerce it to an Rle vector, but there is no coercion method, and, it should be possible anyway to avoid this.

3. In the present case, 'ranges(exons)' contains a CompressedIRangesList with 1 element, and not a simple IRanges object. If it contains more than one element, it seems that aggregate calculates the aggregate vector for each element and just concatenated them. Does this really make sense?

Cheers
  Simon


> sessionInfo()
R version 2.10.0 Under development (unstable) (2009-06-26 r48838)
x86_64-unknown-linux-gnu

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8
 [5] LC_MONETARY=C              LC_MESSAGES=en_GB.UTF-8
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] IRanges_1.3.33    rtracklayer_1.5.7 RCurl_0.98-1      bitops_1.0-4.1

loaded via a namespace (and not attached):
[1] Biobase_2.5.4      Biostrings_2.13.24 BSgenome_1.13.10   tools_2.10.0
[5] XML_2.5-3

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to