Hi Victor --

I cc'd the bioc-sig-sequencing list, in case this is interesting to other people.

Victor Ruotti wrote:
Hello Martin,
We are using ShortRead and are very happy with it.
I was just looking at the graph in the QA report "per-cycle quality score" given by ShortRead. I noticed that the quality scores are "calibrated" using the alignment information. How easy it is to plot the same graph using raw qualities scores, i.e. uncalibrated quality scores?

Is there a way to do this using ShortReads?

There is not automatic way, but here's what you can do.

The uncalibrated scores are in _prb.txt files, and can be read with

> prb = readPrb(dirPath, regex)

where regex is a regular expression defining which _prb files you want to read in to a single 'prb' object. E.g., the first two tiles of lane 1 of a solexa run

> prb <- readPrb(sp, "s_1_000[1:2]_prb.txt")
> prb
class: SFastqQuality
quality:
  A BStringSet instance of length 58018
        width seq
    [1]    36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
    [2]    36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
    [3]    36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhSh
    [4]    36 hhhhh]ChUhhhhhhhhhhBhhhhG`Jhh_hhWhMN
    [5]    36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhM
    [6]    36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
    [7]    36 hhhhhhhhhhhhhhhhhhhhhRhhhhhhhhhMXhSA
    [8]    36 hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
    [9]    36 hhhEhhhhhhhahhhhhOhEIAAKChhhhhhhUGhH
    ...   ... ...
[58010]    36 hhhhhhhhhhPhch`hhRQhWKCGP?BGPIDPSNJL
[58011]    36 [EMAIL PROTECTED]>KFH>ACGGH??BH=
[58012]    36 JNeSPWTJI]NUPHIEJMMEHHTECJNIHIKFB>HA
[58013]    36 I;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
[58014]    36 [EMAIL PROTECTED]>??KIBBIBAI?D?>[EMAIL PROTECTED]
[58015]    36 [EMAIL PROTECTED]@REJJ
[58016]    36 Bhc^eAR>a>[EMAIL PROTECTED]@D>BF>R
[58017]    36 [EMAIL PROTECTED]
[58018]    36 [hhhhhhh[hhhhhh`hghNbh[=hd?I`hheHhTP

From here you can, for instance,

> m = as(prb, "matrix")
> colMeans(m)

> dim(m)
[1] 58018    36
> colMeans(m)
[1] 36.68253 34.77074 34.76543 33.89381 33.70597 33.06138 32.16550 32.02453 [9] 32.00359 31.28284 31.17655 30.73650 30.98292 30.23918 29.65099 28.58637
[17] 27.54683 27.70426 27.19865 25.85327 25.37835 24.94302 24.83817 23.99030
[25] 23.41432 23.42025 22.13992 21.79825 20.64564 19.83484 19.90560 18.64320
[33] 17.63711 17.35210 16.81506 16.62260

Martin

Thanks in advance.
Victor Ruotti



--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M2 B169
Phone: (206) 667-2793

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to