Re: [Bioc-sig-seq] Fwd: "identical" on PhredQuality objects - I'm confused

Marcus Davy Tue, 14 Jun 2011 20:08:25 -0700

Hi,
this doesn't help much as to an answer why, but I can reproduce your
identical output.


>library(ShortRead)
>x <- PhredQuality(0:40)
>y <- SolexaQuality(0:40)

## Not identical
>identical(x,y )
[1] FALSE

>print(x)
  A PhredQuality instance of length 1
    width seq
[1]    41 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHI
>print(y)
  A SolexaQuality instance of length 1
    width seq
[1]    41 @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh

## Note: Both of your quality objects are PhredQuality instances.

## Convert SolexaQuality instance to PhredQuality
> y2 <- PhredQuality(y)
> print(y2)
  A PhredQuality instance of length 1
    width seq
[1]    41 @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh

## Now Identical
> identical(x, y2)
[1] TRUE

The ascii characters are different between x and y2. In your example we can
see an 'H' is substituted for a 'g' which is in the
same ASCII position depending on the offset, maybe that has something to do
with it.

Marcus



On Wed, Jun 15, 2011 at 2:06 PM, Janet Young <[email protected]> wrote:

> maybe I should have included sessionInfo() output:
>
> R version 2.13.0 (2011-04-13)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
>  [5] LC_MONETARY=C              LC_MESSAGES=en_US.UTF-8
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base
>
> other attached packages:
> [1] rtracklayer_1.12.4  RCurl_1.6-5         bitops_1.0-4.1
> [4] ShortRead_1.10.4    Rsamtools_1.4.2     lattice_0.19-26
> [7] Biostrings_2.20.1   GenomicRanges_1.4.6 IRanges_1.10.4
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.12.1  BSgenome_1.20.0 grid_2.13.0     hwriter_1.3
> [5] XML_3.4-0
>
>
> Begin forwarded message:
>
> > From: Janet Young <[email protected]>
> > Date: June 14, 2011 6:58:01 PM PDT
> > To: [email protected]
> > Subject: "identical" on PhredQuality objects - I'm confused
> >
> > Hi there,
> >
> > I'm investigating a mistake I made when running BWA (a little
> embarrassing, but it happens).  I had initially failed to use the -I flag on
> BWA to tell it to convert our Illumina ASCII-64 qualities appropriately.
>  Now I'm taking a look to see how much differnce it makes in our downstream
> analysis.
> >
> > Anyway, that's not why I'm writing - while delving into this I've found
> something unexpected about the "identical" function, and I'd love to
> understand that better if possible.
> >
> > I used scanBam to read in my two BWA output files - with and without the
> "-I" option when I ran BWA (same input seqs, everything else the same). As I
> expected, with the -I option BWA converted the qualities (my
> lane1read2_bwaWithI_bamscan object), and in the other it didn't (my
> lane1read2_bwaNotI_bamscan object).  That makes sense to me - here's how the
> quals look after reading BWA output in with scanBam.
> >
> > ### when I use the -I option with BWA:
> >> head(lane1read2_bwaWithI_bamscan[[1]][["qual"]])
> >  A PhredQuality instance of length 6
> >    width seq
> > [1]    50 HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHFHHHHFHFF
> > [2]    36 GFFEEHHFHHHHHHHEFEEEHHHHHFHHHEEFEE@E
> > [3]    50 ##################################################
> > [4]    31 HHHHHHHHHHHHHHHHHHHHHHEHHHHFHFH
> > [5]    24 HHHHHHHHHHHHHHDFHHHHHHHH
> > [6]    50 FFFFCD=8BGEBHGHHHHEHGFGHHDHEHFHEHHHHGHHHHHHHHHGHHH
> >
> > ### when I don't use the -I option with BWA:
> >> head(lane1read2_bwaNotI_bamscan[[1]][["qual"]])
> >  A PhredQuality instance of length 6
> >    width seq
> > [1]    50 gggggggggggggggggggggggggggggggggggggggggeggggegee
> > [2]    36 feeddggegggggggdedddgggggegggddedd_d
> > [3]    50 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> > [4]    31 ggggggggggggggggggggggdggggegeg
> > [5]    24 ggggggggggggggcegggggggg
> > [6]    50 eeeebc\Wafdagfggggdgfefggcgdgegdggggfgggggggggfggg
> >
> >
> > The part I don't understand is when I use identical to test whether those
> two qual objects are the same, it says they are identical:
> >
> >> identical( lane1read2_bwaWithI_bamscan[[1]][["qual"]],
> lane1read2_bwaNotI_bamscan[[1]][["qual"]])
> > [1] TRUE
> >
> > In one sense they're the same (if you convert from Illumina qualities to
> normal Phred qualities), but how did identical know to do that?  I might be
> misunderstanding identical - I thought that EVERYTHING about the object had
> to be identical.
> >
> > I want to understand this as it seems like a dangerous misunderstanding -
> I use identical a fair amount to make sure I'm not screwing things up.
> >
> > Really I should probably have converted the "qual" item in my
> scamBam-generated list to a SolexaQuality object to keep track of the fact
> they're offset by 64 and not 33, but that's OK. At this point I just want to
> take a look at what's changed about the BWA output when I use that -I flag.
> >
> > Any thoughts?
> >
> > thanks,
> >
> > Janet
> >
> >
> >
> >
> >
> >
> >
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Fwd: "identical" on PhredQuality objects - I'm confused

Reply via email to