maybe I should have included sessionInfo() output: R version 2.13.0 (2011-04-13) Platform: x86_64-unknown-linux-gnu (64-bit)
locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rtracklayer_1.12.4 RCurl_1.6-5 bitops_1.0-4.1 [4] ShortRead_1.10.4 Rsamtools_1.4.2 lattice_0.19-26 [7] Biostrings_2.20.1 GenomicRanges_1.4.6 IRanges_1.10.4 loaded via a namespace (and not attached): [1] Biobase_2.12.1 BSgenome_1.20.0 grid_2.13.0 hwriter_1.3 [5] XML_3.4-0 Begin forwarded message: > From: Janet Young <[email protected]> > Date: June 14, 2011 6:58:01 PM PDT > To: [email protected] > Subject: "identical" on PhredQuality objects - I'm confused > > Hi there, > > I'm investigating a mistake I made when running BWA (a little embarrassing, > but it happens). I had initially failed to use the -I flag on BWA to tell it > to convert our Illumina ASCII-64 qualities appropriately. Now I'm taking a > look to see how much differnce it makes in our downstream analysis. > > Anyway, that's not why I'm writing - while delving into this I've found > something unexpected about the "identical" function, and I'd love to > understand that better if possible. > > I used scanBam to read in my two BWA output files - with and without the "-I" > option when I ran BWA (same input seqs, everything else the same). As I > expected, with the -I option BWA converted the qualities (my > lane1read2_bwaWithI_bamscan object), and in the other it didn't (my > lane1read2_bwaNotI_bamscan object). That makes sense to me - here's how the > quals look after reading BWA output in with scanBam. > > ### when I use the -I option with BWA: >> head(lane1read2_bwaWithI_bamscan[[1]][["qual"]]) > A PhredQuality instance of length 6 > width seq > [1] 50 HHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHFHHHHFHFF > [2] 36 GFFEEHHFHHHHHHHEFEEEHHHHHFHHHEEFEE@E > [3] 50 ################################################## > [4] 31 HHHHHHHHHHHHHHHHHHHHHHEHHHHFHFH > [5] 24 HHHHHHHHHHHHHHDFHHHHHHHH > [6] 50 FFFFCD=8BGEBHGHHHHEHGFGHHDHEHFHEHHHHGHHHHHHHHHGHHH > > ### when I don't use the -I option with BWA: >> head(lane1read2_bwaNotI_bamscan[[1]][["qual"]]) > A PhredQuality instance of length 6 > width seq > [1] 50 gggggggggggggggggggggggggggggggggggggggggeggggegee > [2] 36 feeddggegggggggdedddgggggegggddedd_d > [3] 50 BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB > [4] 31 ggggggggggggggggggggggdggggegeg > [5] 24 ggggggggggggggcegggggggg > [6] 50 eeeebc\Wafdagfggggdgfefggcgdgegdggggfgggggggggfggg > > > The part I don't understand is when I use identical to test whether those two > qual objects are the same, it says they are identical: > >> identical( lane1read2_bwaWithI_bamscan[[1]][["qual"]], >> lane1read2_bwaNotI_bamscan[[1]][["qual"]]) > [1] TRUE > > In one sense they're the same (if you convert from Illumina qualities to > normal Phred qualities), but how did identical know to do that? I might be > misunderstanding identical - I thought that EVERYTHING about the object had > to be identical. > > I want to understand this as it seems like a dangerous misunderstanding - I > use identical a fair amount to make sure I'm not screwing things up. > > Really I should probably have converted the "qual" item in my > scamBam-generated list to a SolexaQuality object to keep track of the fact > they're offset by 64 and not 33, but that's OK. At this point I just want to > take a look at what's changed about the BWA output when I use that -I flag. > > Any thoughts? > > thanks, > > Janet > > > > > > > _______________________________________________ Bioc-sig-sequencing mailing list [email protected] https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing
