On Mon, May 4, 2009 at 2:19 AM, Dan Bolser <[email protected]> wrote:
> 2009/4/23 Martin Morgan <[email protected]>: > > Dan Bolser <[email protected]> writes: > > > >> Sorry for the noob question, but is there a set of standard quality > >> checks in R that I can run over some 454 data? I have the fasta and > >> the fasta format quality files as well as an sff. I scanned the manual > >> for the ShortReads package, but it seems focused on Illumina, I > >> couldn't pick out the general bits from the specifics. > > > > Hi Dan -- > > > > You might be on somewhat uncharted territory here; most of our > > experience (though we have some 454 data now) is with Solexa. > > > > I don't think the standard QA pipeline, along the lines of > > report(qa(<...>)), will work at the moment, but I'll try to add that > > today. > > > > You should be able to read the fasta and quality scores with > > read454(). This returns a 'ShortReadQ' object, srq, that bundles the > > reads, their quality scores, and their ids. > > Hi Martin, thanks for the info on this. I'm have only just got round > to looking at this but I'm a little bit confused about how to read in > the reads / qualities. > > I had expected to say something like: > > x <- read454(fasta = "my.fas", qual.fasta = "my.qual") > > > or > > x <- read454(srf = "my.srf") > > > or > > x <- read454(fastq = "my.fastq") > > > > However, this is clearly not correct. "?read454" brings up the > "RochePath-class" page that suggests I try something like: > > x <- RochePath(readPath="./") > y<- read454(x) > > > But that fails too... "no input files found / pattern: \.fna$". I > tried to set pattern somewhere (to match the ".fas" / and ".qual" > patterns of my files), but I couldn't seem to set them anywhere. > > Can you help with an example of what I am doing wrong? > Thanks for this great feedback. One way to achieve what you want would be to call readFasta and readQual separately for each file, and then combine the results into a ShortReadQ. In the devel version of ShortRead (it might take a day to propagate), I just renamed read454 to readFastaQual and added separate pattern arguments for the fasta and qual file, so you could do: srq <- readFastaQual(".", "my.fas", "my.qual") How does that sound? I think that this interface could be made more convenient (such as having separate pattern arguments for the extensions and then a "base" pattern), but I'm just trying to stay consistent with what is ShortRead already. Michael > > > How come I need to specify a directory and a pattern when what I > really want to do is just to specify a file (because that is what I > have)? > > > Thanks for any help. > > All the best, > Dan. > > > > > The basic touch points of the qa report for read (i.e., not aligned) > > data are numbers of reads, nucleotide frequencies > > > > alphabetFrequency(sread(srq), baseOnly=TRUE, collapse=TRUE) > > > > and cycle-specific alphabet frequencies and average quality scores > > (use alphabetByCycle on sread(srq) and quality(srq)). For 454 it seems > > like a simple plot of average quality score, along the lines of > > alphabetScore(quality(srq)) / width(quality(srq)) against > > width(quality(srq)) can also be quite insightful. There might be > > issues where the functions expect / it makes sense to do analysis on > > uniform-width reads, or on groups of uniformly-widthed reads. > > > > Sorry for the only limited help. > > > > Martin > > > > > >> Thanks for any help, > >> Dan. > >> > >> _______________________________________________ > >> Bioc-sig-sequencing mailing list > >> [email protected] > >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > > > > -- > > Martin Morgan > > Computational Biology / Fred Hutchinson Cancer Research Center > > 1100 Fairview Ave. N. > > PO Box 19024 Seattle, WA 98109 > > > > Location: Arnold Building M1 B861 > > Phone: (206) 667-2793 > > > > _______________________________________________ > Bioc-sig-sequencing mailing list > [email protected] > https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing > [[alternative HTML version deleted]] _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
