Karl,
I just added a viewsMeans method to IRanges 1.3.90 within BioC 2.5 to make this task easier. (This is in svn now and will be available via biocLite tomorrow.) So assuming you are using R-devel (2.10)

## load the packages
suppressMessages(library(ShortRead))

## create lane1 AlignedRead object here

## generate the coverage
## this will produce a SimpleRleList
curCov <- coverage(lane1)

## get the exon boundaries
## this will create a CompressedIRangesList
curExons <-
 split(IRanges(start=capturedExonsOnChrom$exonStart,
               end=capturedExonsOnChrom$exonEnd),
       capturedExons$chrom)

## now get the chroms you are looking for
curCov <- curCov[c(1:22, "X", "Y")]
curExons <- curExons[c(1:22, "X", "Y")]

## create views on your chromosomes
curViews <- RleViewsList(rleList = curCov, rangesList = curExons)

## generate the exon means
viewMeans(curViews)

sessionInfo()
R version 2.10.0 alpha (2009-10-08 r49995) i386-apple-darwin9.8.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ShortRead_1.3.36   lattice_0.17-26    BSgenome_1.13.16   Biostrings_2.13.50
[5] IRanges_1.3.90
loaded via a namespace (and not attached):
[1] Biobase_2.5.8 grid_2.10.0 hwriter_1.1




Sean Davis wrote:
On Fri, Oct 9, 2009 at 4:55 PM, Dykema, Karl <[email protected]> wrote:

Hello All,

I am attempting to calculate the mean number of reads for every exon that
was represented on our custom Nimblegen capture chip. The capture regions
don't line up exactly with the exon coordinates so I identified all exons
that have an overlap, about 9500. I have previously done some crude sampling
of the pileup output and my results were a lot better than what I seem to be
calculating now. Currently I am running coverage() on my alignedRead object
and then looping through the capture regions something like this:

curCov <- coverage(lane1)
for(q in c(1:22,"X","Y")){
       capturedExonsOnChrom <-
capturedExons[which(capturedExons$chrom==q),]
       for(i in 1:length(capturedExonsOnChrom)) {
               exonCoordinates <-
capturedExonsOnChrom[i]$exonStart:capturedExonsOnChrom[i]$exonEnd
               capturedExonMeanCoverage[] <-
mean(as.vector(curCov[[q]])[exonCoordinates])
       }
}

First off, I am assuming that there is a more efficient method to do this.
Is that the case? Also, I have also been using GenomeGraphs to create plots
of the coverage and those plots also seem to show plenty of reads where my
calculations above indicate none. I am hoping there is an error in my
current method because the previous coverage numbers were quite encouraging.
Thanks in advance for your help.


Hi, Karl.

I put together a little vignette that includes, among other things, a little
vignette that includes a short example of calculation of given a capture
method (Agilent, but the idea is the same).  The reads are read in using the
ShortRead package and the capture regions are in the form of a simple bed
file.  The vignette is here:

http://watson.nci.nih.gov/~sdavis/sequencing_example.pdf

Feel free to take a look at it for a slightly more efficient way of doing
things.  I'm not guaranteeing total correctness in the code and not all
parts will apply to nimblegen, I suppose, but it might get you started.

Sean

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to