Hi,

I am using vmatchPattern, to match a pattern against many subjects, I would
like to count the actual numbers
of mismatched bases for each successful match, where max.mismatch is set to
be >0.
I have found the functions 'nmatch', and 'nmismatch' which work with
matchPattern,

e.g.
pattern <- "AATTCGGCACGAGG"
subject <- DNAString("TTCGGCACGAGGTT")
tmp     <- matchPattern(pattern, subject, max.mismatch=2)
## Two mismatches
print(nmismatch(toString(pattern), tmp))

I have had a go at extending this for unique matches with vmatchPattern,
storing the mismatches in a vector,
but a vector does not capture multiple matches between a pattern and a
subject, the output needs to be a list.

pattern    <- DNAStringSet("AATTCGGCACGAGG")
subjects   <- DNAStringSet(c(
                             "match-1" = "TTCGGCACGAGGTT",
                             "match 0" = "ATTCGGCACGAGGT",
                             "match+1" = "AATTCGGCACGAGG",
                             "match+2" = "TAATTCGGCACGAG",
                             "match+3" = "TTAATTCGGCACGA",
                             "match+1 and +15" =
"AATTCGGCACGAGGAATTCGGCACGAGG"
                             )
                           )
print(subjects)

tmp        <- vmatchPattern(toString(pattern), subjects, max.mismatch=2)
print(startIndex(tmp))
ind        <- which(countIndex(tmp)==1)
mismatches <- rep(NA, length(subjects))

for(i in seq(subjects)) {
  viewsList <- extractAllMatches(subjects[[i]], tmp)[i]
  mismatches[i] <- nmismatch(toString(pattern), viewsList)
}

## Second match at +15 in subjects[[6]] is never taken into account
print(mismatches)

Incidentally this sapply equivalent was failing for me, I haven't yet solved
why.
## sapply fails...
sapply(seq(subjects), function(y){ nmismatch(toString(pattern),
extractAllMatches(subjects[[y]], tmp)[y])})

Is there a obvious way to extend these functions to vmatchPattern output of
class MIndex, ideally a list in the
same structural form as startIndex/endIndex? Is this an additional measure
that vmatchPattern could also return?


cheers,


Marcus


sessionInfo()
R version 2.11.1 (2010-05-31)
x86_64-apple-darwin9.8.0

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] Biostrings_2.16.9 IRanges_1.6.11    NGS_0.9.1

loaded via a namespace (and not attached):
[1] Biobase_2.8.0 tools_2.11.1

        [[alternative HTML version deleted]]

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to