Hi Sirisha -- It's important to mention sessionInfo(). Until recently coverage() returned a 'GenomeData' object, more recently it returns a SimpleRleList, which simplifies working with other parts of the Bioconductor tools.
For my setup (based on the development version of R and current Bioconductor packages) I can do library(ShortRead) example(readAligned) # sets up next line, which is the last example aln = readAligned(sp, "s_2_export.txt", filter=filt) cvg = coverage(aln) I then have > cvg SimpleRleList: 5 elements names(5): chr1.fa chr2.fa chr3.fa chr4.fa chr5.fa > cvg[["chr1.fa"]] 'integer' Rle instance of length 189122529 with 44 runs Lengths: 3393024 35 9077058 35 12148400 35 1576458 35 52888550 35 ... Values : 0 1 0 1 0 1 0 1 0 1 ... You can think of 'coverage' like a list, with one element for each 'chromosome'. The elements of the list are 'Rle' objects, meaning that they are a run-length encoded representation of coverage. So above there are 3393024 0's (i.e., uncovered nucleotides), followed by 35 1's (i.e., nucleotides covered by one read), and so on. The Rle ends at the end of the last read, or if you'd provided coverage with an appropriate 'end' argument (see ?'coverage,AlignedRead-method') an arbitrarily determined nucleotide, e.g., the last nucleotide on the chromosome, so padding with 0 coverage from the end of the last read to the end of the chromosome. GenomeData has a very similar structure. Some more below... Sirisha Sunkara wrote: > Hello All, > > I am looking at read coverage bias in a few samples (some libraries > constructed under different PCR amplification cycles), and would like to > look at the coverage plots as % read count distribution/lane over the > chromosome/reference coordinates. Because SimpleRleList is a list, you can do something like lapply(cvg, sum) to get the total coverage on a per-chromosome basis, and laneCnt = sum(unlist(lapply(cvg, sum))) to get the total coverage per lane (assuming you'd read only one lane into aln with readAligned; this will often be a good strategy anyway, because the aligned reads are very large but coverage is trivially small). You can normalize by lane with something along the lines of cvg1 = cvg / laneCnt (if normalizing by lane is appropriate...). In R 2.9.1 the operations are quite similar, e.g., lapply(cvg, "/", laneCnt) gives a simple list of coverage Rle's. > Right now reading the export.txt files with the readAligned and the > coverage methods, I get the raw counts. Is there something that does the > conversion to % over the total read count/lane before plotting? > > Sorry, if I am missing something obvious.. > > Also, I was wondering if the coverage method retains information of > regions with "zero" coverage? This would be particularly useful when > comparing sequencing bias in different samples and/or different library > construction methods. yes, especially if you provide the 'end' argument to coverage. These are the zeros in the Rle. You can do fun things like sum(cvg[[1]] == 0) to find out the total number of uncovered nucleotides, or slice(cvg, upper=0) to get a 'view' of all uncovered nucleotides. I'm not sure if that's what you're asking, or whether you had something else in mind? Martin > sessionInfo() R version 2.10.0 Under development (unstable) (2009-08-21 r49350) x86_64-unknown-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ShortRead_1.3.27 lattice_0.17-25 BSgenome_1.13.10 Biostrings_2.13.33 [5] IRanges_1.3.59 loaded via a namespace (and not attached): [1] Biobase_2.5.5 grid_2.10.0 hwriter_1.1 > > Thanks a lot for any pointers, > Sirisha > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
