Pratap, Abhishek wrote: > Thanks Martin. > > The reason I don't want to use (qa/report) functions here is because I > want to display avg base quality scores from two different runs on the > same graph for the sake of easy visual comparison. In case this is > possible with (qa/report) please correct me. > > Also you mentioned about quality score conversion, could you please > elaborate on what the following function is doing mathematically. I > understand this is a bit off track question. > > FastqQuality(quality(quality(rfq)))
actually, it isn't doing anything mathematically. First I should have said that rfq came from running example(readFastq). quality(rfq) extracts the quality scores from rfq; these are of class 'SFastqQuality', with 'S' standing for 'Solexa'. quality(quality(rfq)) extracts the underlying representation of the quality scores, now a 'BStringSet'. The BStringSet is exactly the same as the BStringSet in the SFastqQuality object. FastqQuality(quality(quality(rfq))) takes that BStringSet and wraps it up, again unchanged, in a 'FastqQuality' object, which is used to represent 'phred' quality scores. So you end up with the same underlying data relabeled (now correctly) as FastqQuality. The only 'math' that occurs is when you follow Johannes' suggestion, and convert this to a matrix as(FastqQuality(quality(quality(rfq))), "matrix") where the letters are interpreted in a way consistent with phred encoding. Incidentally, a different way to arrive at a numerical representation of the underlying data is with alphabetByCycle(quality(rfq)), which returns a matrix summarizing how many times each letter occurs per cycle; you can provide your own interpretation of the letter. And for two quick diversions, it's interesting how sparse the matrix returned by alphabetByCycle is -- the encoding schemes are not really capturing the dynamic range very effectively. Finally, all of the above assume that reads are the same width. If not, you can make them square (e.g., with ?"narrow,ShortReadQ-method"), perhaps with careful choice of arguments to look at the first or last positions (particularly useful for 454 data). Martin Martin > > Thanks, > -Abhi > > > > > > > -----Original Message----- > From: Martin Morgan [mailto:[email protected]] > Sent: Monday, August 03, 2009 4:41 PM > To: Pratap, Abhishek > Cc: Johannes Waage; [email protected] > Subject: Re: [Bioc-sig-seq] Calculating/plotting Avg Base Quality > > Pratap, Abhishek wrote: >> Thanks for a quick revert. Kind of an expected question, when you talk >> about quality format, by default is it the Phred one ? I will be >> importing the FASTQ from Illumina Eland output which is 0-40 scoring >> window. Does that gets automatically converted ? > > fastq files don't contain information on how qualities were encoded; > readFastq isn't smart about this and assumes they're Solexa. If that's > not correct, you can change to phred with > > FastqQuality(quality(quality(rfq))) > > and vice-versa. Also recent Solexa encodings have changed their > calculation (not their encoding) to use use something closer to phred > rather than log-odds. This has consequence for low quality scores. > > You might also look at qa() and report() in the ShortRead package, to > generate a quality report that contains this information. The idea is to > do this in two stages > > qastats <- qa(dirPath, pattern) # slow! > report(qastats) # fast > > see ?qa and ?report. > > Martin > > >> >> >> Thanks, >> >> -Abhi >> >> >> >> From: [email protected] [mailto:[email protected]] On >> Behalf Of Johannes Waage >> Sent: Monday, August 03, 2009 4:04 PM >> To: Pratap, Abhishek; [email protected] >> Subject: Re: [Bioc-sig-seq] Calculating/plotting Avg Base Quality >> >> >> >> Try this: >> >> library(ShortRead) >> my_reads<- readFastq("~/", pattern="my_reads.fastq") >> qualities <- FastqQuality(quality(my_reads)) >> qualities <- as(qualities, "matrix") >> boxplot(as.data.frame((qualities )), outline = F, xlab="Cycle", >> ylab="Quality") >> >> Depending on quality format, you might have to correct your numeric >> values in the matrix, or even convert to base call probabilities. >> >> Regards, >> Johannes Waage, >> Uni of copenhagen >> >> On Mon, Aug 3, 2009 at 9:41 PM, Pratap, Abhishek >> <[email protected]> wrote: >> >> Hi All >> >> >> >> Just wondering if there is a R function in any package to do the >> following. Want to make sure before I write something. >> >> >> >> We would like to calculate avg base quality score for each base called >> per cycle / lane. What will be nice is to also plot these avg quality >> scores/lane for couple of different runs. >> >> >> >> Thanks, >> >> -Abhi >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing >> >> >> >> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
