Michael,
I think we need to reconcile interfaces because ShortRead's
readBaseQuality function is intended to perform the same type of
activity as readFastaQual. The only difference was that it was looking
at Solexa, rather than 454, data.
Patrick
Michael Lawrence wrote:
On Mon, May 4, 2009 at 2:19 AM, Dan Bolser <[email protected]> wrote:
2009/4/23 Martin Morgan <[email protected]>:
Dan Bolser <[email protected]> writes:
Sorry for the noob question, but is there a set of standard quality
checks in R that I can run over some 454 data? I have the fasta and
the fasta format quality files as well as an sff. I scanned the manual
for the ShortReads package, but it seems focused on Illumina, I
couldn't pick out the general bits from the specifics.
Hi Dan --
You might be on somewhat uncharted territory here; most of our
experience (though we have some 454 data now) is with Solexa.
I don't think the standard QA pipeline, along the lines of
report(qa(<...>)), will work at the moment, but I'll try to add that
today.
You should be able to read the fasta and quality scores with
read454(). This returns a 'ShortReadQ' object, srq, that bundles the
reads, their quality scores, and their ids.
Hi Martin, thanks for the info on this. I'm have only just got round
to looking at this but I'm a little bit confused about how to read in
the reads / qualities.
I had expected to say something like:
x <- read454(fasta = "my.fas", qual.fasta = "my.qual")
or
x <- read454(srf = "my.srf")
or
x <- read454(fastq = "my.fastq")
However, this is clearly not correct. "?read454" brings up the
"RochePath-class" page that suggests I try something like:
x <- RochePath(readPath="./")
y<- read454(x)
But that fails too... "no input files found / pattern: \.fna$". I
tried to set pattern somewhere (to match the ".fas" / and ".qual"
patterns of my files), but I couldn't seem to set them anywhere.
Can you help with an example of what I am doing wrong?
Thanks for this great feedback. One way to achieve what you want would be to
call readFasta and readQual separately for each file, and then combine the
results into a ShortReadQ. In the devel version of ShortRead (it might take
a day to propagate), I just renamed read454 to readFastaQual and added
separate pattern arguments for the fasta and qual file, so you could do:
srq <- readFastaQual(".", "my.fas", "my.qual")
How does that sound? I think that this interface could be made more
convenient (such as having separate pattern arguments for the extensions and
then a "base" pattern), but I'm just trying to stay consistent with what is
ShortRead already.
Michael
How come I need to specify a directory and a pattern when what I
really want to do is just to specify a file (because that is what I
have)?
Thanks for any help.
All the best,
Dan.
The basic touch points of the qa report for read (i.e., not aligned)
data are numbers of reads, nucleotide frequencies
alphabetFrequency(sread(srq), baseOnly=TRUE, collapse=TRUE)
and cycle-specific alphabet frequencies and average quality scores
(use alphabetByCycle on sread(srq) and quality(srq)). For 454 it seems
like a simple plot of average quality score, along the lines of
alphabetScore(quality(srq)) / width(quality(srq)) against
width(quality(srq)) can also be quite insightful. There might be
issues where the functions expect / it makes sense to do analysis on
uniform-width reads, or on groups of uniformly-widthed reads.
Sorry for the only limited help.
Martin
Thanks for any help,
Dan.
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M1 B861
Phone: (206) 667-2793
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing