Hi Faud --

Fuad Gwadry wrote:
Hi Martin.
My data is generated from the old pipeline (illumina 0.3) and therefore should not be a problem. In any case I tested your suggestion but it had no effect on the data. The colMeans(m) are the same before and after incorporating your suggestions. Also when I use alignQuality I get NAs in both cases. Let me know how I can help to trouble shoot this.

NA's in alignQuality are present when the reads did not align; not all of the values will be NA. see below for the issue with base quality scores.

-- Fuad
The first part is using my original data: > aln
class: AlignedRead
length: 4933275 reads; width: 32 cycles
chromosome: phiX174 phiX174 ... phiX174 phiX174
position: 5147 1513 ... 5034 3107
strand: - - ... + +
alignQuality: NumericQuality
alignData varLabels: similar mismatch
 > head(quality(aln))
class: SFastqQuality
quality:
  A BStringSet instance of length 6
    width seq
[1]    32 <<<<<<<<<<<<<<<<<;<<<;<<<<<5.<,:
[2]    32 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
[3]    32 <<<<<<<<<<<<;<<;<<<<<<<+<<<<<<<<
[4]    32 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
[5]    32 <<<<<<<<<<<<<<<<<<<;<<<<<<<<<<6<
[6]    32 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
 > m <- as(quality(aln), "matrix")
 > colMeans(m)
 [1] -3.865733 -3.867642 -3.878435 -3.869599 -3.884619 -3.891333 -3.904493
 [8] -3.933454 -3.965951 -3.995224 -4.301351 -4.331125 -4.427614 -4.495761
[15] -4.556334 -4.586264 -4.623881 -4.653223 -4.744307 -4.825347 -4.913756
[22] -5.000141 -5.138716 -5.213431 -5.375823 -5.635962 -5.779450 -5.791400
[29] -5.944196 -6.175808 -6.368913 -6.597291
 > alignQuality(aln)
class: NumericQuality
quality: NA NA ... NA NA (4933275 total)
##################
incorporating your suggestions
##################

 > qual <- FastqQuality(quality(quality(aln)))
 > qual
class: FastqQuality
quality:
  A BStringSet instance of length 4933275
          width seq
      [1]    32 <<<<<<<<<<<<<<<<<;<<<;<<<<<5.<,:
      [2]    32 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      [3]    32 <<<<<<<<<<<<;<<;<<<<<<<+<<<<<<<<
      [4]    32 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      [5]    32 <<<<<<<<<<<<<<<<<<<;<<<<<<<<<<6<
      [6]    32 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      [7]    32 <<<<<<<<<<<<<<<<+<-<<<<;<<<<<<<<
      [8]    32 <<<<<<<<<<<<<<<<<<<<<<<<<,<+<<<<
      [9]    32 <<<<<<<<<<<<<<<<<<<5<<<<7<<<<<<2
      ...   ... ...
[4933267]    32 <<<<<<<<<<<<<<<<<<<<<<3<<8<<<<<<
[4933268]    32 <<<<<<<<<<<<<<<<<<<<<;<<<98<<<<<
[4933269]    32 <<<<<<<<<<<<<<<<</<<<5<<1;<;<7<<
[4933270]    32 <<<<<<<<<<<<<<<<<<<<<<<<;<<<<<6:
[4933271]    32 <<<<<<<<73;<;<<<<<;';57<;<0;<<55
[4933272]    32 <<<<<<3<<<<<<<<6<<<<<<-7<<<7*<<7
[4933273]    32 <<<<<<<<<<<<<3<<<<<<+<<6<<<4&5<6
[4933274]    32 <<<<<<<<<<<<<<<<<<<<<<<<4<<<<<<<
[4933275]    32 <<<<<<<<<<<<<<<<<7<<;<8<<6<<<<<)
 >   initialize(aln, quality=qual)
class: AlignedRead
length: 4933275 reads; width: 32 cycles
chromosome: phiX174 phiX174 ... phiX174 phiX174
position: 5147 1513 ... 5034 3107
strand: - - ... + +
alignQuality: NumericQuality
alignData varLabels: similar mismatch

The statement above should be

   aln <- initialize(aln, quality=qual)

i.e., create a new instance and save it to the variable aln. Save it to a different variable if you want to avoid over-writing aln.

Martin

 > m <- as(quality(aln), "matrix")
 > colMeans(m)
 [1] -3.865733 -3.867642 -3.878435 -3.869599 -3.884619 -3.891333 -3.904493
 [8] -3.933454 -3.965951 -3.995224 -4.301351 -4.331125 -4.427614 -4.495761
[15] -4.556334 -4.586264 -4.623881 -4.653223 -4.744307 -4.825347 -4.913756
[22] -5.000141 -5.138716 -5.213431 -5.375823 -5.635962 -5.779450 -5.791400
[29] -5.944196 -6.175808 -6.368913 -6.597291
 > alignQuality(aln)
class: NumericQuality
quality: NA NA ... NA NA (4933275 total)
> sessionInfo()
R version 2.9.1 (2009-06-26)
x86_64-unknown-linux-gnu
locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base other attached packages: [1] ShortRead_1.3.22 lattice_0.17-25 BSgenome_1.13.10 Biostrings_2.13.29 [5] IRanges_1.3.44 loaded via a namespace (and not attached): [1] Biobase_2.4.1 grid_2.9.1 hwriter_1.1 > Date: Tue, 4 Aug 2009 09:07:51 -0700
 > From: [email protected]
 > To: [email protected]
> CC: [email protected]; [email protected]; [email protected] > Subject: Re: [Bioc-sig-seq] Problem with ShortRead reading quality scores from bowtie
 >
 > Martin Morgan wrote:
 > > Fuad Gwadry wrote:
 > >> Hi All
 > >>
 > >>
 > >>
 > >> I am getting negative values when reading quality scores when I read
 > >> data generated in bowtie. Has anyone run into the same issue when
 > >> using data generated by bowtie ? My session info is below.
 > >
 > > Hi Fuad -- ShortRead is reading the quality scores on the wrong scale
> > (solexa, rather than phred; this will be fixed before the next release).
 > > Try
 > >
 > > qual <- FastqQuality(quality(quality(aln))
 > > initialize(aln, quality=qual)
 > >
 > > to update aln, or
 > >
 > > m <- as(FastqQuality(quality(quality(aln)), "matrix")
 > >
 > > for a one-off solution.
 >
 > I wanted to clarify, too, both for this post and one yesterday, that
 > as() is simply converting the character encoding to the corresponding
 > integer value that each letter encodes; there is a secondary mapping
 > from this encoding to log-odds or phred score that is not being
 > performed. This step is, I think
 >
 > 10^(-m/10) for phred scores
 > 1 - 1 / (1 + 10^(-m/10)) for Solexa scores
 >
 > Solexa has changed its encoding scheme very recently; I think it is now
 > standard phred but am not sure.
 >
 > Martin
 >
 > >
 > > Martin
 > >
 > >>
 > >>
 > >>
 > >> Thanks in advance
 > >>
 > >>
 > >>
 > >> Fuad
 > >>
 > >>
 > >>
 > >>> aln
 > >> class: AlignedRead
 > >> length: 4591807 reads; width: 32 cycles
 > >> chromosome: chr13 chr7 ... chr6 chr4 position: 93437004 13223395 ...
 > >> 23636747 23353864 strand: - - ... + + alignQuality: NumericQuality
 > >> alignData varLabels: similar mismatch
 > >>
 > >>> m <- as(quality(aln), "matrix")
 > >>> colMeans(m)
 > >> [1] -7.186638 -7.205858 -7.203382 -7.197175 -7.203629 -7.217016
 > >> [7] -7.240661 -7.249238 -7.268499 -7.286551 -7.306615 -7.324003
 > >> [13] -7.523238 -7.581242 -7.697591 -7.695861 -7.735321 -7.743323
 > >> [19] -7.752996 -7.849403 -7.862658 -7.931969 -7.979778 -8.029288
 > >> [25] -8.120469 -8.215818 -8.335176 -8.411609 -8.587005 -8.820979
 > >> [31] -11.447326 -11.644198
 > >>
 > >>
 > >>> sessionInfo()
 > >> R version 2.9.1 (2009-06-26) x86_64-unknown-linux-gnu
 > >> locale:
> >> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
 > >>
 > >>
 > >> attached base packages:
 > >> [1] stats graphics grDevices utils datasets methods base
 > >> other attached packages:
 > >> [1] ShortRead_1.3.22 lattice_0.17-25 BSgenome_1.13.10
 > >> Biostrings_2.13.29
 > >> [5] IRanges_1.3.44
 > >> loaded via a namespace (and not attached):
 > >> [1] Biobase_2.4.1 grid_2.9.1 hwriter_1.1
 > >>
 > >> _________________________________________________________________
> >> More storage. Better anti-spam and antivirus protection. Hotmail makes
 > >> it simple.
 > >>
 > >> [[alternative HTML version deleted]]
 > >>
 > >> _______________________________________________
 > >> Bioc-sig-sequencing mailing list
 > >> [email protected]
 > >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
 > >
 > >
 >
 >
 > --
 > Martin Morgan
 > Computational Biology / Fred Hutchinson Cancer Research Center
 > 1100 Fairview Ave. N.
 > PO Box 19024 Seattle, WA 98109
 >
 > Location: Arnold Building M1 B861
 > Phone: (206) 667-2793

------------------------------------------------------------------------
Attention all humans. We are your photos. Free us. <http://go.microsoft.com/?linkid=9666046>


--
Martin Morgan
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to