Dan Bolser <[email protected]> writes: > Sorry for the noob question, but is there a set of standard quality > checks in R that I can run over some 454 data? I have the fasta and > the fasta format quality files as well as an sff. I scanned the manual > for the ShortReads package, but it seems focused on Illumina, I > couldn't pick out the general bits from the specifics.
Hi Dan -- You might be on somewhat uncharted territory here; most of our experience (though we have some 454 data now) is with Solexa. I don't think the standard QA pipeline, along the lines of report(qa(<...>)), will work at the moment, but I'll try to add that today. You should be able to read the fasta and quality scores with read454(). This returns a 'ShortReadQ' object, srq, that bundles the reads, their quality scores, and their ids. The basic touch points of the qa report for read (i.e., not aligned) data are numbers of reads, nucleotide frequencies alphabetFrequency(sread(srq), baseOnly=TRUE, collapse=TRUE) and cycle-specific alphabet frequencies and average quality scores (use alphabetByCycle on sread(srq) and quality(srq)). For 454 it seems like a simple plot of average quality score, along the lines of alphabetScore(quality(srq)) / width(quality(srq)) against width(quality(srq)) can also be quite insightful. There might be issues where the functions expect / it makes sense to do analysis on uniform-width reads, or on groups of uniformly-widthed reads. Sorry for the only limited help. Martin > Thanks for any help, > Dan. > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
