On Mon, May 4, 2009 at 2:19 AM, Dan Bolser <[email protected]> wrote:

> 2009/4/23 Martin Morgan <[email protected]>:
> > Dan Bolser <[email protected]> writes:
> >
> >> Sorry for the noob question, but is there a set of standard quality
> >> checks in R that I can run over some 454 data? I have the fasta and
> >> the fasta format quality files as well as an sff. I scanned the manual
> >> for the ShortReads package, but it seems focused on Illumina, I
> >> couldn't pick out the general bits from the specifics.
> >
> > Hi Dan --
> >
> > You might be on somewhat uncharted territory here; most of our
> > experience (though we have some 454 data now) is with Solexa.
> >
> > I don't think the standard QA pipeline, along the lines of
> > report(qa(<...>)), will work at the moment, but I'll try to add that
> > today.
> >
> > You should be able to read the fasta and quality scores with
> > read454(). This returns a 'ShortReadQ' object, srq, that bundles the
> > reads, their quality scores, and their ids.
>
> Hi Martin, thanks for the info on this. I'm have only just got round
> to looking at this but I'm a little bit confused about how to read in
> the reads / qualities.
>
> I had expected to say something like:
>
> x <- read454(fasta = "my.fas", qual.fasta = "my.qual")
>
>
> or
>
> x <- read454(srf = "my.srf")
>
>
> or
>
> x <- read454(fastq = "my.fastq")
>
>
>
> However, this is clearly not correct. "?read454" brings up the
> "RochePath-class" page that suggests I try something like:
>
> x <- RochePath(readPath="./")
> y<- read454(x)
>
>
> But that fails too... "no input files found / pattern: \.fna$". I
> tried to set pattern somewhere (to match the ".fas" / and ".qual"
> patterns of my files), but I couldn't seem to set them anywhere.
>
> Can you help with an example of what I am doing wrong?
>


Thanks for this great feedback. One way to achieve what you want would be to
call readFasta and readQual separately for each file, and then combine the
results into a ShortReadQ. In the devel version of ShortRead (it might take
a day to propagate), I just renamed read454 to readFastaQual and added
separate pattern arguments for the fasta and qual file, so you could do:
srq <- readFastaQual(".", "my.fas", "my.qual")

How does that sound? I think that this interface could be made more
convenient (such as having separate pattern arguments for the extensions and
then a "base" pattern), but I'm just trying to stay consistent with what is
ShortRead already.

Michael


>
>
> How come I need to specify a directory and a pattern when what I
> really want to do is just to specify a file (because that is what I
> have)?
>
>
> Thanks for any help.
>
> All the best,
> Dan.
>
>
>
> > The basic touch points of the qa report for read (i.e., not aligned)
> > data are numbers of reads, nucleotide frequencies
> >
> >  alphabetFrequency(sread(srq), baseOnly=TRUE, collapse=TRUE)
> >
> > and cycle-specific alphabet frequencies and average quality scores
> > (use alphabetByCycle on sread(srq) and quality(srq)). For 454 it seems
> > like a simple plot of average quality score, along the lines of
> > alphabetScore(quality(srq)) / width(quality(srq)) against
> > width(quality(srq)) can also be quite insightful. There might be
> > issues where the functions expect / it makes sense to do analysis on
> > uniform-width reads, or on groups of uniformly-widthed reads.
> >
> > Sorry for the only limited help.
> >
> > Martin
> >
> >
> >> Thanks for any help,
> >> Dan.
> >>
> >> _______________________________________________
> >> Bioc-sig-sequencing mailing list
> >> [email protected]
> >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> >
> > --
> > Martin Morgan
> > Computational Biology / Fred Hutchinson Cancer Research Center
> > 1100 Fairview Ave. N.
> > PO Box 19024 Seattle, WA 98109
> >
> > Location: Arnold Building M1 B861
> > Phone: (206) 667-2793
> >
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to