Martin Morgan wrote:
Fuad Gwadry wrote:
Hi All

I am getting negative values when reading quality scores when I read data generated in bowtie. Has anyone run into the same issue when using data generated by bowtie ? My session info is below.

Hi Fuad -- ShortRead is reading the quality scores on the wrong scale (solexa, rather than phred; this will be fixed before the next release). Try

  qual <- FastqQuality(quality(quality(aln))
  initialize(aln, quality=qual)

to update aln, or

  m <- as(FastqQuality(quality(quality(aln)), "matrix")

for a one-off solution.

I wanted to clarify, too, both for this post and one yesterday, that as() is simply converting the character encoding to the corresponding integer value that each letter encodes; there is a secondary mapping from this encoding to log-odds or phred score that is not being performed. This step is, I think

  10^(-m/10) for phred scores
  1 - 1 / (1 + 10^(-m/10)) for Solexa scores

Solexa has changed its encoding scheme very recently; I think it is now standard phred but am not sure.

Martin


Martin


Thanks in advance

Fuad

aln
class: AlignedRead
length: 4591807 reads; width: 32 cycles
chromosome: chr13 chr7 ... chr6 chr4 position: 93437004 13223395 ... 23636747 23353864 strand: - - ... + + alignQuality: NumericQuality alignData varLabels: similar mismatch

m <- as(quality(aln), "matrix")
colMeans(m)
 [1]  -7.186638  -7.205858  -7.203382  -7.197175  -7.203629  -7.217016
 [7]  -7.240661  -7.249238  -7.268499  -7.286551  -7.306615  -7.324003
[13]  -7.523238  -7.581242  -7.697591  -7.695861  -7.735321  -7.743323
[19]  -7.752996  -7.849403  -7.862658  -7.931969  -7.979778  -8.029288
[25]  -8.120469  -8.215818  -8.335176  -8.411609  -8.587005  -8.820979
[31] -11.447326 -11.644198


sessionInfo()
R version 2.9.1 (2009-06-26) x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics grDevices utils datasets methods base other attached packages: [1] ShortRead_1.3.22 lattice_0.17-25 BSgenome_1.13.10 Biostrings_2.13.29 [5] IRanges_1.3.44 loaded via a namespace (and not attached): [1] Biobase_2.4.1 grid_2.9.1 hwriter_1.1
_________________________________________________________________
More storage. Better anti-spam and antivirus protection. Hotmail makes it simple.

    [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing




--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to