Dear list,

I would like to count the occurrence of (mostly single) nucleotide polymorphisms from nucleotide sequences. I got across the Biostrings package and pairwiseAlignment() that allows me to get closer to what I want but 1) I noticed that the score produced from pairwiseAlignment() is quite different to other implementations of the Needlaman-Wunsch alogorithm (eg in EMBOSS) 2) the score is not directly the information I 'm looking for since it's a mixture of the gaps & mismatches (and I don't see if/how one could modify that).

However, I would primarily be interested in finding where a given nucleotide differs from the query (from a pairwise alignment) to some statistics on them, ie at which position I get which other element instead. Note, that my sample-sequences may start or end slightly later/earlier.
Any suggestions ?

Sample code might look like (of course, my real sequences are longer ...):

ref <- DNAString("ACTTCACCAGCTCCCTGGC")
samp <- DNAStringSet(c("CTTCTCCAGCTCCCTGG","ACTTCTCCAGCTACCTGG","TTCACCAGCTCCCTG")) # the 3rd one has no mutations, it's simply shorter ... pairwiseAlignment(ref, samp[[1]], substitutionMatrix = mat, gapOpening = -5, gapExtension = -2)
alignScores <- numeric()
for(i in 1:3) alignScores[i] <- pairwiseAlignment(ref, samp[[i]], substitutionMatrix = mat, gapOpening = -5, gapExtension = -2, scoreOnly=T)
alignScores     # the 3rd sequence without mismatches gets worst score


(Based on a previous post on BioC) I just subscribed to [email protected], but I don't know if I don't mange to search the previous mail archives (on http://search.gmane.org/) since I keep getting (general) Bioconductor messages.

Thank's in advance,
Wolfgang


By the way, if that matters, I'm (still) running R-2.7.2
> sessionInfo()
R version 2.7.2 (2008-08-25)
i386-pc-mingw32

locale:
LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252

attached base packages:
[1] stats graphics grDevices datasets tcltk utils methods base
other attached packages:
[1] Biostrings_2.8.18 svSocket_0.9-5 svIO_0.9-5 R2HTML_1.59 svMisc_0.9-5 svIDE_0.9-5
loaded via a namespace (and not attached):
[1] tools_2.7.2


. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Wolfgang Raffelsberger, PhD
Laboratoire de BioInformatique et Génomique Intégratives
CNRS UMR7104, IGBMC 1 rue Laurent Fries, 67404 Illkirch Strasbourg, France
Tel (+33) 388 65 3300         Fax (+33) 388 65 3276
wolfgang.raffelsberger (at) igbmc.fr

_______________________________________________
Bioc-sig-sequencing mailing list
[email protected]
https://stat.ethz.ch/mailman/listinfo/bioc-sig-sequencing

Reply via email to