Re: [Bioc-sig-seq] Problem with ShortRead reading quality scores from bowtie

Fuad Gwadry Wed, 05 Aug 2009 07:35:11 -0700

Hi Martin.


My data is generated from the old pipeline (illumina 0.3) and therefore should 
not be a problem. In any case I tested your suggestion but it had no effect on 
the data. The colMeans(m) are the same before and after incorporating your 
suggestions. Also when I use alignQuality I get NAs in both cases. Let me know 
how I can help to trouble shoot this. -- Fuad

 

The first part is using my original data:

 

> aln
class: AlignedRead
length: 4933275 reads; width: 32 cycles
chromosome: phiX174 phiX174 ... phiX174 phiX174 
position: 5147 1513 ... 5034 3107 
strand: - - ... + + 
alignQuality: NumericQuality 
alignData varLabels: similar mismatch 
> head(quality(aln))
class: SFastqQuality
quality:
  A BStringSet instance of length 6
    width seq
[1]    32 <<<<<<<<<<<<<<<<<;<<<;<<<<<5.<,:
[2]    32 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
[3]    32 <<<<<<<<<<<<;<<;<<<<<<<+<<<<<<<<
[4]    32 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
[5]    32 <<<<<<<<<<<<<<<<<<<;<<<<<<<<<<6<
[6]    32 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
> m <- as(quality(aln), "matrix")
> colMeans(m)
 [1] -3.865733 -3.867642 -3.878435 -3.869599 -3.884619 -3.891333 -3.904493
 [8] -3.933454 -3.965951 -3.995224 -4.301351 -4.331125 -4.427614 -4.495761
[15] -4.556334 -4.586264 -4.623881 -4.653223 -4.744307 -4.825347 -4.913756
[22] -5.000141 -5.138716 -5.213431 -5.375823 -5.635962 -5.779450 -5.791400
[29] -5.944196 -6.175808 -6.368913 -6.597291

> alignQuality(aln)
class: NumericQuality
quality: NA NA ... NA NA (4933275 total)

 

##################

incorporating your suggestions

##################


> qual <- FastqQuality(quality(quality(aln)))
> qual
class: FastqQuality
quality:
  A BStringSet instance of length 4933275
          width seq
      [1]    32 <<<<<<<<<<<<<<<<<;<<<;<<<<<5.<,:
      [2]    32 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      [3]    32 <<<<<<<<<<<<;<<;<<<<<<<+<<<<<<<<
      [4]    32 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      [5]    32 <<<<<<<<<<<<<<<<<<<;<<<<<<<<<<6<
      [6]    32 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      [7]    32 <<<<<<<<<<<<<<<<+<-<<<<;<<<<<<<<
      [8]    32 <<<<<<<<<<<<<<<<<<<<<<<<<,<+<<<<
      [9]    32 <<<<<<<<<<<<<<<<<<<5<<<<7<<<<<<2
      ...   ... ...
[4933267]    32 <<<<<<<<<<<<<<<<<<<<<<3<<8<<<<<<
[4933268]    32 <<<<<<<<<<<<<<<<<<<<<;<<<98<<<<<
[4933269]    32 <<<<<<<<<<<<<<<<</<<<5<<1;<;<7<<
[4933270]    32 <<<<<<<<<<<<<<<<<<<<<<<<;<<<<<6:
[4933271]    32 <<<<<<<<73;<;<<<<<;';57<;<0;<<55
[4933272]    32 <<<<<<3<<<<<<<<6<<<<<<-7<<<7*<<7
[4933273]    32 <<<<<<<<<<<<<3<<<<<<+<<6<<<4&5<6
[4933274]    32 <<<<<<<<<<<<<<<<<<<<<<<<4<<<<<<<
[4933275]    32 <<<<<<<<<<<<<<<<<7<<;<8<<6<<<<<)

>   initialize(aln, quality=qual)
class: AlignedRead
length: 4933275 reads; width: 32 cycles
chromosome: phiX174 phiX174 ... phiX174 phiX174 
position: 5147 1513 ... 5034 3107 
strand: - - ... + + 
alignQuality: NumericQuality 
alignData varLabels: similar mismatch 
> m <- as(quality(aln), "matrix")
> colMeans(m)
 [1] -3.865733 -3.867642 -3.878435 -3.869599 -3.884619 -3.891333 -3.904493
 [8] -3.933454 -3.965951 -3.995224 -4.301351 -4.331125 -4.427614 -4.495761
[15] -4.556334 -4.586264 -4.623881 -4.653223 -4.744307 -4.825347 -4.913756
[22] -5.000141 -5.138716 -5.213431 -5.375823 -5.635962 -5.779450 -5.791400
[29] -5.944196 -6.175808 -6.368913 -6.597291

> alignQuality(aln)
class: NumericQuality
quality: NA NA ... NA NA (4933275 total)

 

> sessionInfo()
R version 2.9.1 (2009-06-26) 
x86_64-unknown-linux-gnu 

locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ShortRead_1.3.22   lattice_0.17-25    BSgenome_1.13.10   Biostrings_2.13.29
[5] IRanges_1.3.44    

loaded via a namespace (and not attached):
[1] Biobase_2.4.1 grid_2.9.1    hwriter_1.1  

 

 
> Date: Tue, 4 Aug 2009 09:07:51 -0700
> From: [email protected]
> To: [email protected]
> CC: [email protected]; [email protected]; 
> [email protected]
> Subject: Re: [Bioc-sig-seq] Problem with ShortRead reading quality scores 
> from bowtie
> 
> Martin Morgan wrote:
> > Fuad Gwadry wrote:
> >> Hi All
> >>
> >> 
> >>
> >> I am getting negative values when reading quality scores when I read 
> >> data generated in bowtie. Has anyone run into the same issue when 
> >> using data generated by bowtie ? My session info is below. 
> > 
> > Hi Fuad -- ShortRead is reading the quality scores on the wrong scale 
> > (solexa, rather than phred; this will be fixed before the next release). 
> > Try
> > 
> > qual <- FastqQuality(quality(quality(aln))
> > initialize(aln, quality=qual)
> > 
> > to update aln, or
> > 
> > m <- as(FastqQuality(quality(quality(aln)), "matrix")
> > 
> > for a one-off solution.
> 
> I wanted to clarify, too, both for this post and one yesterday, that 
> as() is simply converting the character encoding to the corresponding 
> integer value that each letter encodes; there is a secondary mapping 
> from this encoding to log-odds or phred score that is not being 
> performed. This step is, I think
> 
> 10^(-m/10) for phred scores
> 1 - 1 / (1 + 10^(-m/10)) for Solexa scores
> 
> Solexa has changed its encoding scheme very recently; I think it is now 
> standard phred but am not sure.
> 
> Martin
> 
> > 
> > Martin
> > 
> >>
> >> 
> >>
> >> Thanks in advance
> >>
> >> 
> >>
> >> Fuad
> >>
> >> 
> >>
> >>> aln
> >> class: AlignedRead
> >> length: 4591807 reads; width: 32 cycles
> >> chromosome: chr13 chr7 ... chr6 chr4 position: 93437004 13223395 ... 
> >> 23636747 23353864 strand: - - ... + + alignQuality: NumericQuality 
> >> alignData varLabels: similar mismatch
> >>
> >>> m <- as(quality(aln), "matrix")
> >>> colMeans(m)
> >> [1] -7.186638 -7.205858 -7.203382 -7.197175 -7.203629 -7.217016
> >> [7] -7.240661 -7.249238 -7.268499 -7.286551 -7.306615 -7.324003
> >> [13] -7.523238 -7.581242 -7.697591 -7.695861 -7.735321 -7.743323
> >> [19] -7.752996 -7.849403 -7.862658 -7.931969 -7.979778 -8.029288
> >> [25] -8.120469 -8.215818 -8.335176 -8.411609 -8.587005 -8.820979
> >> [31] -11.447326 -11.644198
> >>
> >>
> >>> sessionInfo()
> >> R version 2.9.1 (2009-06-26) x86_64-unknown-linux-gnu
> >> locale:
> >> LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=C;LC_MESSAGES=en_US.UTF-8;LC_PAPER=en_US.UTF-8;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=en_US.UTF-8;LC_IDENTIFICATION=C
> >>  
> >>
> >>
> >> attached base packages:
> >> [1] stats graphics grDevices utils datasets methods base 
> >> other attached packages:
> >> [1] ShortRead_1.3.22 lattice_0.17-25 BSgenome_1.13.10 
> >> Biostrings_2.13.29
> >> [5] IRanges_1.3.44 
> >> loaded via a namespace (and not attached):
> >> [1] Biobase_2.4.1 grid_2.9.1 hwriter_1.1 
> >>
> >> _________________________________________________________________
> >> More storage. Better anti-spam and antivirus protection. Hotmail makes 
> >> it simple.
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> _______________________________________________
> >> Bioc-sig-sequencing mailing list
> >> [email protected]
> >> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
> > 
> > 
> 
> 
> -- 
> Martin Morgan
> Computational Biology / Fred Hutchinson Cancer Research Center
> 1100 Fairview Ave. N.
> PO Box 19024 Seattle, WA 98109
> 
> Location: Arnold Building M1 B861
> Phone: (206) 667-2793

_________________________________________________________________
Attention all humans. We are your photos. Free us.

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Problem with ShortRead reading quality scores from bowtie

Reply via email to