Joern,
The aligned method for PairwiseAlignedFixedSubject objects extracts an XStringSet containing the alignments in the "coordinate system" of the fixed subject; i.e. the first position represents the first position of the fixed subject, the second position represents the second position of the fixed subject, etc. This means aligned will contain matched/mismatched characters and deletions, but it will not tell you about insertions. I'll make the man page more clear to avoid confusion in the future.


suppressMessages(library(Biostrings))
subject <- DNAString("AAAACCCCGGGGTTTT")
pattern <- pattern <- DNAStringSet(c("left" = "ACGTACGTAAAA", "deletions" = "AAAAGGGGTTTT", "insertions" = "AAAAAACCCCCCGGGGTTTT", "right" = "TTTTACGT"))
pa <- pairwiseAlignment(pattern, subject, type = "global-local")
alignedPA <- aligned(pa)
names(alignedPA) <- names(pattern)
alignedPA
  A DNAStringSet instance of length 4
    width seq                                               names
[1]    16 AAAA------------                                  left
[2]    16 AAAA----GGGGTTTT                                  deletions
[3]    16 AAAACCCCGGGGTTTT                                  insertions
[4]    16 ------------TTTT                                  right
sessionInfo()
R version 2.10.0 Under development (unstable) (2009-05-08 r48504)
i386-apple-darwin9.6.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.13.9 IRanges_1.3.16

loaded via a namespace (and not attached):
[1] Biobase_2.5.2


Quoting Joern Toedling <[email protected]>:

Hello,

I believe there is a minor bug in the visualization provided by the "aligned"
method of the class "PairwiseAlignedFixedSubject".
I performed a pairwise alignment looking for overlaps between reads and a
fixed subject DNAString. Here are 10 reads which all show some form overlap
with the subject.

aligned(pa)
  A DNAStringSet instance of length 10
     width seq
 [1]    37 CTTTAGGCACCAT------------------------
 [2]    37 CTGTAGTCACCATC-----------------------
 [3]    37 CTGTAGGCACCATC-----------------------
 [4]    37 CTGTAGGCACCATC-----------------------
 [5]    37 CTGTAGGCACCATCAAT--------------------
 [6]    37 CTTTAGTCAACATC-----------------------
 [7]    37 CTGTAGG------------------------------
 [8]    37 CTGTAGGCACCATCAATCGT-----------------
 [9]    37 --------------------------------GCTTT
[10]    37 CTGTATTCACCATCAA---------------------
start(pattern(pa))
 [1] 24 23 23 23 20 23 30 17  1 21
end(pattern(pa))
 [1] 36 36 36 36 36 36 36 36  5 36

So in fact for 9 of the 10 reads the overlap is at the end of the read, and
for one at the start. However, the visualization of the "aligned" method
wrongly suggest that it is the other way round.

Regards,
Joern

sessionInfo()
R version 2.10.0 Under development (unstable) (2009-05-18 r48561)
i686-pc-linux-gnu

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base

other attached packages:
[1] ShortRead_1.3.5   lattice_0.17-25   BSgenome_1.13.2   Biostrings_2.13.9
[5] IRanges_1.3.16

loaded via a namespace (and not attached):
[1] Biobase_2.5.2 grid_2.10.0   hwriter_1.1


---
Joern Toedling
Institut Curie -- UMR218
26 rue d'Ulm, 75005 Paris, FRANCE
Tel. +33 (0)156246942

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing


_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to