Hi,
this doesn't help much as to an answer why, but I can reproduce your
identical output.
>library(ShortRead)
>x <- PhredQuality(0:40)
>y <- SolexaQuality(0:40)
## Not identical
>identical(x,y )
[1] FALSE
>print(x)
A PhredQuality instance of length 1
width seq
[1] 41 !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHI
>print(y)
A SolexaQuality instance of length 1
width seq
[1] 41 @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh
## Note: Both of your quality objects are PhredQuality instances.
## Convert SolexaQuality instance to PhredQuality
> y2 <- PhredQuality(y)
> print(y2)
A PhredQuality instance of length 1
width seq
[1] 41 @ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefgh
## Now Identical
> identical(x, y2)
[1] TRUE
The ascii characters are different between x and y2. In your example we can
see an 'H' is substituted for a 'g' which is in the
same ASCII position depending on the offset, maybe that has something to do
with it.
Marcus
On Wed, Jun 15, 2011 at 2:06 PM, Janet Young <[email protected]> wrote:
> maybe I should have included sessionInfo() output:
>
> R version 2.13.0 (2011-04-13)
> Platform: x86_64-unknown-linux-gnu (64-bit)
>
> locale:
> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
> [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
> [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8
> [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
> [9] LC_ADDRESS=C LC_TELEPHONE=C
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
>
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
>
> other attached packages:
> [1] rtracklayer_1.12.4 RCurl_1.6-5 bitops_1.0-4.1
> [4] ShortRead_1.10.4 Rsamtools_1.4.2 lattice_0.19-26
> [7] Biostrings_2.20.1 GenomicRanges_1.4.6 IRanges_1.10.4
>
> loaded via a namespace (and not attached):
> [1] Biobase_2.12.1 BSgenome_1.20.0 grid_2.13.0 hwriter_1.3
> [5] XML_3.4-0
>
>
> Begin forwarded message:
>
> > From: Janet Young <[email protected]>
> > Date: June 14, 2011 6:58:01 PM PDT
> > To: [email protected]
> > Subject: "identical" on PhredQuality objects - I'm confused
> >
> > Hi there,
> >
> > I'm investigating a mistake I made when running BWA (a little
> embarrassing, but it happens). I had initially failed to use the -I flag on
> BWA to tell it to convert our Illumina ASCII-64 qualities appropriately.
> Now I'm taking a look to see how much differnce it makes in our downstream
> analysis.
> >
> > Anyway, that's not why I'm writing - while delving into this I've found
> something unexpected about the "identical" function, and I'd love to
> understand that better if possible.
> >
> > I used scanBam to read in my two BWA output files - with and without the
> "-I" option when I ran BWA (same input seqs, everything else the same). As I
> expected, with the -I option BWA converted the qualities (my
> lane1read2_bwaWithI_bamscan object), and in the other it didn't (my
> lane1read2_bwaNotI_bamscan object). That makes sense to me - here's how the
> quals look after reading BWA output in with scanBam.
> >
> > ### when I use the -I option with BWA:
> >> head(lane1read2_bwaWithI_bamscan[[1]][["qual"]])
> > A PhredQuality instance of length 6
> > width seq
> > [1] 50 HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHFHHHHFHFF
> > [2] 36 GFFEEHHFHHHHHHHEFEEEHHHHHFHHHEEFEE@E
> > [3] 50 ##################################################
> > [4] 31 HHHHHHHHHHHHHHHHHHHHHHEHHHHFHFH
> > [5] 24 HHHHHHHHHHHHHHDFHHHHHHHH
> > [6] 50 FFFFCD=8BGEBHGHHHHEHGFGHHDHEHFHEHHHHGHHHHHHHHHGHHH
> >
> > ### when I don't use the -I option with BWA:
> >> head(lane1read2_bwaNotI_bamscan[[1]][["qual"]])
> > A PhredQuality instance of length 6
> > width seq
> > [1] 50 gggggggggggggggggggggggggggggggggggggggggeggggegee
> > [2] 36 feeddggegggggggdedddgggggegggddedd_d
> > [3] 50 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB
> > [4] 31 ggggggggggggggggggggggdggggegeg
> > [5] 24 ggggggggggggggcegggggggg
> > [6] 50 eeeebc\Wafdagfggggdgfefggcgdgegdggggfgggggggggfggg
> >
> >
> > The part I don't understand is when I use identical to test whether those
> two qual objects are the same, it says they are identical:
> >
> >> identical( lane1read2_bwaWithI_bamscan[[1]][["qual"]],
> lane1read2_bwaNotI_bamscan[[1]][["qual"]])
> > [1] TRUE
> >
> > In one sense they're the same (if you convert from Illumina qualities to
> normal Phred qualities), but how did identical know to do that? I might be
> misunderstanding identical - I thought that EVERYTHING about the object had
> to be identical.
> >
> > I want to understand this as it seems like a dangerous misunderstanding -
> I use identical a fair amount to make sure I'm not screwing things up.
> >
> > Really I should probably have converted the "qual" item in my
> scamBam-generated list to a SolexaQuality object to keep track of the fact
> they're offset by 64 and not 33, but that's OK. At this point I just want to
> take a look at what's changed about the BWA output when I use that -I flag.
> >
> > Any thoughts?
> >
> > thanks,
> >
> > Janet
> >
> >
> >
> >
> >
> >
> >
>
> _______________________________________________
> Bioc-sig-sequencing mailing list
> [email protected]
> https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
>
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing