Patrick,
thank for your quick response !
For a while things went well but now I got across a case where I have
difficulty interpreting the number(s) given for start-positions of
deletions. This is when there is a preceding insertion before a deletion
occurs/ gets encountered.
Of course I could correct this, ie shift the values myself according to
the number preceding inserted nucletides, but I posted this since I'm
not sure if you're aware of this special case/problem.
Or maybe you have some other command/way to access (ie extract) the
sequence(s) deleted ?
(Note that the equivalent approach for extracting insertions works
without any problems :))
Wolfgang
> suppressMessages(library(Biostrings))
> mySubject <- ref1 <- DNAString("GGGATAGTGACTTCAGGATCCAGCTCTTCGCCTGGC")
> myPattern <- samp1 <-
DNAStringSet(c("GGGATAGTGACTACAGGATCCAGCTCTTCGCCTGGC","ATAGTGACGTAGGATCACAGCTCTTCGCCGGC","TTGGGAGTGACTTGCAGGATCCAGTCTTCGCCTGGCAT"))
> ## 1st has a mutation; 2nd: {5'del+ mut(g)+ del(C)+ del(T)+ ins(A)}
; 3rd: {5'ins + del(TA)+ ins(G)+ del(C)+ 3'ins}
> ## multiple alignment of test-case (gapOpening= -4, gapExtension= -2)
> ## GGGATAGTGACTT-CAGGATC-CAGCTCTTCGCCTGGC .. ref = subject
> ## GGGATAGTGACTa-CAGGATC-CAGCTCTTCGCCTGGC .. (1) mutation (T
-> A)
> ## ATAGTGACgT--AGGATCACAGCTCTTCGCC-GGC .. (2) 5'del +
mut(g) + del(C)+ ins(A)+ del(T)
> ## ttGGGA--GTGACTTGCAGGATC-CAG-TCTTCGCCTGGCat .. (3) 5'ins +
del(TA)+ ins(G)+ del(C)+ 3'ins
>
> align <- pairwiseAlignment(myPattern,mySubject,gapOpening= -4,
gapExtension= -2)
>
> nindel(align) # no of indels per seq
An object of class "InDel"
Slot "insertion":
Length WidthSum
[1,] 0 0
[2,] 1 1
[3,] 1 1
Slot "deletion":
Length WidthSum
[1,] 0 0
[2,] 2 2
[3,] 2 3
> (delStart <- start(indel(pattern(align))))
[1] 11 30 5 23
> (delEnd <- end(indel(pattern(align))))
[1] 11 30 6 23
>
> (seqNo_wDel <- rep.int(1:length(align),
elementLengths(indel(pattern(align))))) # sequence-number for each
instance of deletion
[1] 2 2 3 3
>
> for(i in 1:length(seqNo_wDel))
print(substr(as.character(unaligned(subject(align[seqNo_wDel[i]]))),delStart[i],delEnd[i]))
[1] "C"
[1] "G"
[1] "TA"
[1] "G"
## however the "true" deleted ones are : C, T, TA, C => problem with
deletions after preceding insertion (ie ech 2nd case)
> for(i in 1:length(seqNo_wDel)) print(substr(as.character(
aligned(pattern(align[seqNo_wDel[i]]))),delStart[i],delEnd[i]))
[1] "-"
[1] "C"
[1] "--"
[1] "A"
> ## ==> problem with pôsition of deletions after preceding insertion
(ie ech 2nd case)
>
> sessionInfo()
R version 2.10.0 Under development (unstable) (2009-07-22 r48979)
x86_64-unknown-linux-gnu
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biostrings_2.13.27 IRanges_1.3.42
loaded via a namespace (and not attached):
[1] Biobase_2.5.4
>
Patrick Aboyoun a écrit :
Wolfgang,
The IRanges infrastructure has evolved greatly from BioC 2.4/R-2.9 to
BioC 2.5/R-2.10 and I highly recommend you use accessor functions,
rather than direct slot access to obtain the information you are
looking for. For example, the IRangesList class definition has changed
greatly and the @elements slot is no longer present.
Question: how to transform an IRangesList object to a list of IRanges
object
Answer:
Short answer, use as.list as in as.list(indel(subject(align))). Long
answer, this is typically not necessary and not desired given that
IRangesList has many methods defined for them. See man pages for
IRangesList, RangesList, and Sequence (if you are on BioC 2.5/R-2.10).
If you can tell me what operations you would like to perform, I can
point you in the right direction.
Question: how to the the start, width, and end locations of a pairwise
alignment.
Answer: use the start, width and end methods.
> start(subject(align))
[1] 1 1 4
> width(subject(align))
[1] 24 24 19
> end(subject(align))
[1] 24 24 22
Patrick
Wolfgang Raffelsberger wrote:
Patrick,
thank you very much for your quick and helpful answer !
Yes, using :
> align <- pairwiseAlignment(samp1,ref1)
> indel(subject(align))
I'm about to get what I'm looking for. Now my question is, which
commands are (will be) availabel for mining an IRangesList-object.
Most of all I'm interested in what would correspond to getting :
> indel(subject(align))@elements
> subject(align)@ra...@start
> subject(align)@ra...@witdth # in fact, so far I can do without
this one
(unless you think the @elements, and @range won't change in the
future ...)
With these elements I manage now to extract the very nucleotides that
were inserted/deleted.
Wolfgang
Patrick Aboyoun a écrit :
Wolfgang,
Below is code that retrieves the indel locations you are looking
for. I like your attempts at using indel, insertion, and deletion
for PairwiseAlignment objects and I'll add the methods for
PairwiseAlignment objects to BioC 2.5 (devel) shortly using the
conventions that I specify below.
> suppressMessages(library(Biostrings))
> ref1 <- DNAString("GGGATACTTCACCAGCTCCCTGGC") # my pattern
> samp1 <-
DNAStringSet(c("GGGATACTACACCAGCTCCCTGGC","GGGATACTTACACCAGCTCCCTGGC","ATACTTCACCAGCTCCCTG"))
> # 1st has a mutation, 2nd has an insertion, the 3rd is simply
shorter ...
>
> align <- pairwiseAlignment(samp1,ref1)
>
> nindel(align)
An object of class “InDel”
Slot "insertion":
Length WidthSum
[1,] 0 0
[2,] 1 1
[3,] 0 0
Slot "deletion":
Length WidthSum
[1,] 0 0
[2,] 0 0
[3,] 0 0
> deletions <- indel(pattern(align))
> deletions
CompressedIRangesList: 3 elements
> insertions <- indel(subject(align))
> insertions
CompressedIRangesList: 3 elements
> insertions[[2]]
IRanges instance:
start end width
[1] 10 10 1
> sessionInfo()
R version 2.10.0 Under development (unstable) (2009-06-28 r48863)
i386-apple-darwin9.7.0
locale:
[1] en_US.UTF-8/en_US.UTF-8/C/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biostrings_2.13.26 IRanges_1.3.41
loaded via a namespace (and not attached):
[1] Biobase_2.5.4
Wolfgang Raffelsberger wrote:
Dear list,
previously I've been extracting indel-information from sequences
aligned by the Biostrings function pairwiseAlignment(), which is
probably not the best way since the class
'PairwiseAlignedFixedSubject' has evoled & changed and my old code
won't work any more. Now trying to use the library-provided
functions to access the information/details about indels (ie their
localization on the pattern and possibly the indel sequence ).
However, I can't find a function to extract this information, that
is (to the best of my knowledge) part of the aligned object.
## here an example :
library(Biostrings)
ref1 <- DNAString("GGGATACTTCACCAGCTCCCTGGC") # my pattern
samp1 <-
DNAStringSet(c("GGGATACTACACCAGCTCCCTGGC","GGGATACTTACACCAGCTCCCTGGC","ATACTTCACCAGCTCCCTG"))
# 1st has a mutation, 2nd has an insertion, the 3rd is simply
shorter ...
align <- pairwiseAlignment(samp1,ref1)
nindel(align) # insertion was found properly but I can't see at
which nt position the indel was found (neither if it's an insertion
or deletion)
indel(align) # Error in function (classes, fdef, mtable) unable to
find an inherited method for function...
insertion(align) # Error in function (classes, fdef, mtable) unable
to find an inherited method for function ...
deletion(align) # neither ...
?AlignedXStringSet # says under 'Accessor methods' that indel()
exists ..
## ideally I'd be looking for something like
mismatchTable(align) # but addressing indels ...
## for completeness :
> sessionInfo()
R version 2.9.1 (2009-06-26)
i386-pc-mingw32
locale:
LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] ShortRead_1.2.1 lattice_0.17-25 BSgenome_1.12.3
Biostrings_2.12.7 IRanges_1.2.3
loaded via a namespace (and not attached):
[1] Biobase_2.4.1 grid_2.9.1 hwriter_1.1
Thank's in advance,
Wolfgang Raffelsberger
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wolfgang Raffelsberger, PhD
Laboratoire de BioInformatique et Génomique Intégratives
CNRS UMR7104, IGBMC,
1 rue Laurent Fries, 67404 Illkirch Strasbourg, France
Tel (+33) 388 65 3300 Fax (+33) 388 65 3276
wolfgang.raffelsberger (at) igbmc.fr
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing