Yes! Thank you.
That is exactly what I was looking for.
I'd like to compare the calibrated versus the raw quality scores.
Great help.
Victor
On Nov 4, 2008, at 2:46 PM, Martin Morgan wrote:
Hi Victor --
I cc'd the bioc-sig-sequencing list, in case this is interesting to
other people.
Victor Ruotti wrote:
Hello Martin,
We are using ShortRead and are very happy with it.
I was just looking at the graph in the QA report "per-cycle quality
score" given by ShortRead. I noticed that the quality scores are
"calibrated" using the alignment information. How easy it is to
plot the same graph using raw qualities scores, i.e. uncalibrated
quality scores?
Is there a way to do this using ShortReads?
There is not automatic way, but here's what you can do.
The uncalibrated scores are in _prb.txt files, and can be read with
> prb = readPrb(dirPath, regex)
where regex is a regular expression defining which _prb files you
want to read in to a single 'prb' object. E.g., the first two tiles
of lane 1 of a solexa run
> prb <- readPrb(sp, "s_1_000[1:2]_prb.txt")
> prb
class: SFastqQuality
quality:
A BStringSet instance of length 58018
width seq
[1] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
[2] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
[3] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhSh
[4] 36 hhhhh]ChUhhhhhhhhhhBhhhhG`Jhh_hhWhMN
[5] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhM
[6] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
[7] 36 hhhhhhhhhhhhhhhhhhhhhRhhhhhhhhhMXhSA
[8] 36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
[9] 36 hhhEhhhhhhhahhhhhOhEIAAKChhhhhhhUGhH
... ... ...
[58010] 36 hhhhhhhhhhPhch`hhRQhWKCGP?BGPIDPSNJL
[58011] 36 [EMAIL PROTECTED]>KFH>ACGGH??BH=
[58012] 36 JNeSPWTJI]NUPHIEJMMEHHTECJNIHIKFB>HA
[58013] 36 I;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[58014] 36 [EMAIL PROTECTED]>??KIBBIBAI?D?>[EMAIL PROTECTED]
[58015] 36 [EMAIL PROTECTED]@REJJ
[58016] 36 Bhc^eAR>a>[EMAIL PROTECTED]@D>BF>R
[58017] 36 [EMAIL PROTECTED]
[58018] 36 [hhhhhhh[hhhhhh`hghNbh[=hd?I`hheHhTP
From here you can, for instance,
> m = as(prb, "matrix")
> colMeans(m)
> dim(m)
[1] 58018 36
> colMeans(m)
[1] 36.68253 34.77074 34.76543 33.89381 33.70597 33.06138 32.16550
32.02453
[9] 32.00359 31.28284 31.17655 30.73650 30.98292 30.23918 29.65099
28.58637
[17] 27.54683 27.70426 27.19865 25.85327 25.37835 24.94302 24.83817
23.99030
[25] 23.41432 23.42025 22.13992 21.79825 20.64564 19.83484 19.90560
18.64320
[33] 17.63711 17.35210 16.81506 16.62260
Martin
Thanks in advance.
Victor Ruotti
--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109
Location: Arnold Building M2 B169
Phone: (206) 667-2793
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing