2009/4/23 Martin Morgan <[email protected]>: > Dan Bolser <[email protected]> writes: > >> Sorry for the noob question, but is there a set of standard quality >> checks in R that I can run over some 454 data? I have the fasta and >> the fasta format quality files as well as an sff. I scanned the manual >> for the ShortReads package, but it seems focused on Illumina, I >> couldn't pick out the general bits from the specifics. > > Hi Dan -- > > You might be on somewhat uncharted territory here; most of our > experience (though we have some 454 data now) is with Solexa. > > I don't think the standard QA pipeline, along the lines of > report(qa(<...>)), will work at the moment, but I'll try to add that > today. > > You should be able to read the fasta and quality scores with > read454(). This returns a 'ShortReadQ' object, srq, that bundles the > reads, their quality scores, and their ids.
Hi Martin, thanks for the info on this. I'm have only just got round to looking at this but I'm a little bit confused about how to read in the reads / qualities. I had expected to say something like: x <- read454(fasta = "my.fas", qual.fasta = "my.qual") or x <- read454(srf = "my.srf") or x <- read454(fastq = "my.fastq") However, this is clearly not correct. "?read454" brings up the "RochePath-class" page that suggests I try something like: x <- RochePath(readPath="./") y<- read454(x) But that fails too... "no input files found / pattern: \.fna$". I tried to set pattern somewhere (to match the ".fas" / and ".qual" patterns of my files), but I couldn't seem to set them anywhere. Can you help with an example of what I am doing wrong? How come I need to specify a directory and a pattern when what I really want to do is just to specify a file (because that is what I have)? Thanks for any help. All the best, Dan. > The basic touch points of the qa report for read (i.e., not aligned) > data are numbers of reads, nucleotide frequencies > > alphabetFrequency(sread(srq), baseOnly=TRUE, collapse=TRUE) > > and cycle-specific alphabet frequencies and average quality scores > (use alphabetByCycle on sread(srq) and quality(srq)). For 454 it seems > like a simple plot of average quality score, along the lines of > alphabetScore(quality(srq)) / width(quality(srq)) against > width(quality(srq)) can also be quite insightful. There might be > issues where the functions expect / it makes sense to do analysis on > uniform-width reads, or on groups of uniformly-widthed reads. > > Sorry for the only limited help. > > Martin > > >> Thanks for any help, >> Dan. >> >> _______________________________________________ >> Bioc-sig-sequencing mailing list >> [email protected] >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > -- > Martin Morgan > Computational Biology / Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N. > PO Box 19024 Seattle, WA 98109 > > Location: Arnold Building M1 B861 > Phone: (206) 667-2793 > _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
