hello all,
thank you for your detailed and useful comments. I implemented a C function
based
on the ape API that computes pairwise genetic distances based on model
raw for
ambiguous bases.
The simple source code is here
https://github.com/olli0601/hivclust/blob/master/pkg/src/hivc.cpp
I believe that some of the #defines could be made more efficient - please
let me know.
Feel free to add this to future release of ape if you find this useful.
Oliver
On Fri, May 3, 2013 at 8:58 AM, Emmanuel Paradis emmanuel.para...@ird.frwrote:
Hi Oliver,
I guess you wrote it in R; most of the computations on DNA seqs in ape are
done with C code. Maybe you can post this info on the list so others who
might be interested can contact you if they want?
I guess it's worth having a systematic treatment of this issue with
distances. Joe is right that an ML treatment offers a solution (and it is
possible with phangorn) but distances based on counting bases is still
useful for big data sets.
Cheers,
Emmanuel
Thu, 2 May 2013 14:03:10 +0100 Oliver Ratmann
oliver.ratm...@imperial.ac.uk**:
hello Emmanuel,
I wrote a small extension that accounts for ambiguous sites for the raw
and N model this morning.
Let me know if this is of interest and I d be happy to contribute this to
a future release of ape.
Oliver
On Thu, May 2, 2013 at 1:48 PM, Emmanuel Paradis emmanuel.para...@ird.fr
**wrote:
Hi Oliver,
The current behaviour of dist.dna() is to ignore ambiguous nucleotides.
The difficulty is that comparisons with ambiguous bases may or may not be
informative depending on the model. For instance, the comparison R-Y is
clearly a transversion, while the example you give (A-S) may be either
a
transition or a transversion, so it is informative for the JC69 model but
not for K80. Maybe some weighting scheme could be used for this problem.
If
someone is aware of some systematic treatment of this issue, please let
me
know.
With real data, it is wise to check the quality of the alignment, for
instance with base.freq(, all = TRUE) or image().
Best,
Emmanuel
Wed, 1 May 2013 17:33:06 +0100 Oliver Ratmann
oliver.ratm...@imperial.ac.uk:
hello all,
using ape3.0-8, I seem to get pairwise genetic distances that are not
consistent with the IUPAC code.
dist.dna( as.DNAbin( rbind( c(a,a),c(s,a) ) ), model=raw )
1
2 0
S means C or G, so I would expect a distance of 0.5 ?
Am I doing anything wrong ?
Thank you,
Oliver
--
+44 07577 026 978 (mobile)
+44 020 759 41451 (office)
Oliver Ratmann
Dep Infectious Disease Epidemiology
Imperial College London, St Mary's Campus
Norfolk Place,
London W2 1PG
UK
--
+44 07577 026 978 (mobile)
+44 020 759 41451 (office)
Oliver Ratmann
Dep Infectious Disease Epidemiology
Imperial College London, St Mary's Campus
Norfolk Place,
London W2 1PG
UK
[[alternative HTML version deleted]]
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/