Hi Martin, This is great, exactly what I was asking for - Thanks a lot!
Sorry, I didn't realize how frequently the changes are being made at-least to the ShortRead Package. In my installation, the coverage function still seems to return the "GenomeData" object. Is the svn Changelog ( http://fgc.lsi.umich.edu/cgi-bin/blosxom.cgi) the best place to look up all the updates? Thanks Again. > sessionInfo() R version 2.10.0 Under development (unstable) (2009-08-19 r49321) x86_64-unknown-linux-gnu locale: [1] C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ShortRead_1.3.25 lattice_0.17-25 BSgenome_1.13.10 Biostrings_2.13.32 [5] IRanges_1.3.56 loaded via a namespace (and not attached): [1] Biobase_2.5.5 grid_2.10.0 hwriter_1.1 tools_2.10.0 --sirisha ----- Original Message ----- From: Martin Morgan <[email protected]> Date: Monday, August 24, 2009 10:56 pm Subject: Re: [Bioc-sig-seq] readAligned and coverage methods of the short read package To: [email protected] Cc: [email protected] > Hi Sirisha -- > > It's important to mention sessionInfo(). Until recently coverage() > returned a 'GenomeData' object, more recently it returns a > SimpleRleList, which simplifies working with other parts of the > Bioconductor tools. > > For my setup (based on the development version of R and current > Bioconductor packages) I can do > > library(ShortRead) > example(readAligned) # sets up next line, which is the last example > aln = readAligned(sp, "s_2_export.txt", filter=filt) > cvg = coverage(aln) > > I then have > > > cvg > SimpleRleList: 5 elements > names(5): chr1.fa chr2.fa chr3.fa chr4.fa chr5.fa > > cvg[["chr1.fa"]] > 'integer' Rle instance of length 189122529 with 44 runs > Lengths: 3393024 35 9077058 35 12148400 35 1576458 35 52888550 > 35 ... > Values : 0 1 0 1 0 1 0 1 0 1 ... > > You can think of 'coverage' like a list, with one element for each > 'chromosome'. The elements of the list are 'Rle' objects, meaning that > they are a run-length encoded representation of coverage. So above > thereare 3393024 0's (i.e., uncovered nucleotides), followed by 35 > 1's (i.e., > nucleotides covered by one read), and so on. The Rle ends at the > end of > the last read, or if you'd provided coverage with an appropriate 'end' > argument (see ?'coverage,AlignedRead-method') an arbitrarily > determinednucleotide, e.g., the last nucleotide on the chromosome, > so padding with > 0 coverage from the end of the last read to the end of the chromosome. > > GenomeData has a very similar structure. > > Some more below... > > Sirisha Sunkara wrote: > > Hello All, > > > > I am looking at read coverage bias in a few samples (some libraries > > constructed under different PCR amplification cycles), and would > like to > > look at the coverage plots as % read count distribution/lane > over the > > chromosome/reference coordinates. > > Because SimpleRleList is a list, you can do something like > > lapply(cvg, sum) > > to get the total coverage on a per-chromosome basis, and > > laneCnt = sum(unlist(lapply(cvg, sum))) > > to get the total coverage per lane (assuming you'd read only one lane > into aln with readAligned; this will often be a good strategy anyway, > because the aligned reads are very large but coverage is trivially > small). You can normalize by lane with something along the lines of > > cvg1 = cvg / laneCnt > > (if normalizing by lane is appropriate...). In R 2.9.1 the operations > are quite similar, e.g., lapply(cvg, "/", laneCnt) gives a simple list > of coverage Rle's. > > > Right now reading the export.txt files with the readAligned and the > > coverage methods, I get the raw counts. Is there something that > does the > > conversion to % over the total read count/lane before plotting? > > > > Sorry, if I am missing something obvious.. > > > > Also, I was wondering if the coverage method retains information of > > regions with "zero" coverage? This would be particularly useful when > > comparing sequencing bias in different samples and/or different > library> construction methods. > > yes, especially if you provide the 'end' argument to coverage. > These are > the zeros in the Rle. You can do fun things like > > sum(cvg[[1]] == 0) > > to find out the total number of uncovered nucleotides, or > > slice(cvg, upper=0) > > to get a 'view' of all uncovered nucleotides. > > I'm not sure if that's what you're asking, or whether you had > somethingelse in mind? > > Martin > > > sessionInfo() > R version 2.10.0 Under development (unstable) (2009-08-21 r49350) > x86_64-unknown-linux-gnu > > locale: > [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C > [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 > [7] LC_PAPER=en_US.UTF-8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > other attached packages: > [1] ShortRead_1.3.27 lattice_0.17-25 BSgenome_1.13.10 > Biostrings_2.13.33 > [5] IRanges_1.3.59 > > loaded via a namespace (and not attached): > [1] Biobase_2.5.5 grid_2.10.0 hwriter_1.1 > > > > > > Thanks a lot for any pointers, > > Sirisha > > > > _______________________________________________ > > Bioc-sig-sequencing mailing list > > [email protected] > > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > > _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
