On Oct 29, 2010, at 6:17 PM, Kunbin Qu wrote:
Is there a way to NOT trim a segment in the pattern from trimLRPatterns? For example:

t1<-DNAString("AAAATGGTAC")
t2<-DNAString("TGGTAC")
t3<-trimLRPatterns(Rpattern=t2, subject=t1, max.Rmismatch = rep (0,4), ranges=T, with.Rindels=TRUE)
t4<-DNAStringSet(t1, start=start(t3), end=end(t3))

t4
  A DNAStringSet instance of length 1
    width seq
[1]     4 AAAA
t1
  10-letter "DNAString" instance
seq: AAAATGGTAC
t2
  6-letter "DNAString" instance
seq: TGGTAC


If I want to keep "TG" in t1, followed by the AAAA, which also appears in the beginning of the pattern t2, is there a way? The rep (0,4) does not help since it could find a perfect match at length 6 (the first try when starting to trim). I am trying to use "TG" as my anchor points to avoid false over-trimming. Thanks.


If you mean to allow trimming a pattern like "TG.." in letters 7-10 or less of your subject, even if TG occurs in letters 5-6, as in your t1, then you might want Rpattern="TGGT", and max.Rmismatch should have length 4 (or less, if you want to prevent trimming less than 4 letters).

If I didn't get the point, a longer answer:

Usually, I over-trimmed when I set the mismatch limits too high (especially with indels allowed), probably when I used the "rate" feature for the mismatch parameter, and that resulted in certain mismatch elements being larger than I wanted. But you have zero mismatches allowed, and your whole Rpattern occurs exactly, as the final 6 letters of your subject. So your example is a good example of what the function does (except that you don't need to allow indels, and you could get your t4 as the value of trimLRPatterns by using ranges=F, the default). You can put -1 in the mismatch vector,
for example

        M = some integer
trimLRPatterns(Rpattern=t2, subject=t1, max.Rmismatch = c(rep(-1,5), M))

This allows the 6-letter suffix of your subject to be trimmed if it matches your Rpattern with M or less errors, but it prevents trimming by anything shorter. Actually, your rep(0,4) is the same as c(rep(-1,2), rep(0,4)). That is, trimming by 6 or 5 letters is allowed, but not by less. Thus,

> trimLRPatterns(Rpattern="TGGTAC", subject="AAAATGGTAC", max.Rmismatch=0)
[1] "AAAA"

> trimLRPatterns(Rpattern="TGGTAC", subject="AAAATGGTAC", max.Rmismatch=c(rep(-1,5), 0))
[1] "AAAA"

You can even put -1 as the 6th element of max.Rmismatch, but then you might as well start with
Rpattern="TGGTA"; or -1 in positions 5 and 6, so, Rpattern="TGGT".

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to