Hi,
I am using vmatchPattern, to match a pattern against many subjects, I would
like to count the actual numbers
of mismatched bases for each successful match, where max.mismatch is set to
be >0.
I have found the functions 'nmatch', and 'nmismatch' which work with
matchPattern,
e.g.
pattern <- "AATTCGGCACGAGG"
subject <- DNAString("TTCGGCACGAGGTT")
tmp <- matchPattern(pattern, subject, max.mismatch=2)
## Two mismatches
print(nmismatch(toString(pattern), tmp))
I have had a go at extending this for unique matches with vmatchPattern,
storing the mismatches in a vector,
but a vector does not capture multiple matches between a pattern and a
subject, the output needs to be a list.
pattern <- DNAStringSet("AATTCGGCACGAGG")
subjects <- DNAStringSet(c(
"match-1" = "TTCGGCACGAGGTT",
"match 0" = "ATTCGGCACGAGGT",
"match+1" = "AATTCGGCACGAGG",
"match+2" = "TAATTCGGCACGAG",
"match+3" = "TTAATTCGGCACGA",
"match+1 and +15" =
"AATTCGGCACGAGGAATTCGGCACGAGG"
)
)
print(subjects)
tmp <- vmatchPattern(toString(pattern), subjects, max.mismatch=2)
print(startIndex(tmp))
ind <- which(countIndex(tmp)==1)
mismatches <- rep(NA, length(subjects))
for(i in seq(subjects)) {
viewsList <- extractAllMatches(subjects[[i]], tmp)[i]
mismatches[i] <- nmismatch(toString(pattern), viewsList)
}
## Second match at +15 in subjects[[6]] is never taken into account
print(mismatches)
Incidentally this sapply equivalent was failing for me, I haven't yet solved
why.
## sapply fails...
sapply(seq(subjects), function(y){ nmismatch(toString(pattern),
extractAllMatches(subjects[[y]], tmp)[y])})
Is there a obvious way to extend these functions to vmatchPattern output of
class MIndex, ideally a list in the
same structural form as startIndex/endIndex? Is this an additional measure
that vmatchPattern could also return?
cheers,
Marcus
sessionInfo()
R version 2.11.1 (2010-05-31)
x86_64-apple-darwin9.8.0
locale:
[1] C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] Biostrings_2.16.9 IRanges_1.6.11 NGS_0.9.1
loaded via a namespace (and not attached):
[1] Biobase_2.8.0 tools_2.11.1
[[alternative HTML version deleted]]
_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing