Re: [Bioc-sig-seq] Minor bug in method 'aligned' of class 'PairwiseAlignedFixedSubject'

Patrick Aboyoun Mon, 25 May 2009 14:43:42 -0700

Joern,

The aligned method for PairwiseAlignedFixedSubject objects extracts anXStringSet containing the alignments in the "coordinate system" of thefixed subject; i.e. the first position represents the first positionof the fixed subject, the second position represents the secondposition of the fixed subject, etc. This means aligned will containmatched/mismatched characters and deletions, but it will not tell youabout insertions. I'll make the man page more clear to avoid confusionin the future.

suppressMessages(library(Biostrings))
subject <- DNAString("AAAACCCCGGGGTTTT")
pattern <- pattern <- DNAStringSet(c("left" = "ACGTACGTAAAA","deletions" = "AAAAGGGGTTTT", "insertions" = "AAAAAACCCCCCGGGGTTTT","right" = "TTTTACGT"))
pa <- pairwiseAlignment(pattern, subject, type = "global-local")
alignedPA <- aligned(pa)
names(alignedPA) <- names(pattern)
alignedPA

  A DNAStringSet instance of length 4
    width seq                                               names
[1]    16 AAAA------------                                  left
[2]    16 AAAA----GGGGTTTT                                  deletions
[3]    16 AAAACCCCGGGGTTTT                                  insertions
[4]    16 ------------TTTT                                  right

sessionInfo()

R version 2.10.0 Under development (unstable) (2009-05-08 r48504)
i386-apple-darwin9.6.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.13.9 IRanges_1.3.16

loaded via a namespace (and not attached):
[1] Biobase_2.5.2


Quoting Joern Toedling <[email protected]>:

Hello,

I believe there is a minor bug in the visualization provided by the "aligned"
method of the class "PairwiseAlignedFixedSubject".
I performed a pairwise alignment looking for overlaps between reads and a
fixed subject DNAString. Here are 10 reads which all show some form overlap
with the subject.

aligned(pa)

  A DNAStringSet instance of length 10
     width seq
 [1]    37 CTTTAGGCACCAT------------------------
 [2]    37 CTGTAGTCACCATC-----------------------
 [3]    37 CTGTAGGCACCATC-----------------------
 [4]    37 CTGTAGGCACCATC-----------------------
 [5]    37 CTGTAGGCACCATCAAT--------------------
 [6]    37 CTTTAGTCAACATC-----------------------
 [7]    37 CTGTAGG------------------------------
 [8]    37 CTGTAGGCACCATCAATCGT-----------------
 [9]    37 --------------------------------GCTTT
[10]    37 CTGTATTCACCATCAA---------------------

start(pattern(pa))

 [1] 24 23 23 23 20 23 30 17  1 21

end(pattern(pa))

 [1] 36 36 36 36 36 36 36 36  5 36

So in fact for 9 of the 10 reads the overlap is at the end of the read, and
for one at the start. However, the visualization of the "aligned" method
wrongly suggest that it is the other way round.

Regards,
Joern

sessionInfo()

R version 2.10.0 Under development (unstable) (2009-05-18 r48561)
i686-pc-linux-gnu

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

other attached packages:
[1] ShortRead_1.3.5   lattice_0.17-25   BSgenome_1.13.2   Biostrings_2.13.9
[5] IRanges_1.3.16

loaded via a namespace (and not attached):
[1] Biobase_2.5.2 grid_2.10.0   hwriter_1.1


---
Joern Toedling
Institut Curie -- UMR218
26 rue d'Ulm, 75005 Paris, FRANCE
Tel. +33 (0)156246942

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Re: [Bioc-sig-seq] Minor bug in method 'aligned' of class 'PairwiseAlignedFixedSubject'

Reply via email to