Hi Jarrett,
(Cc to the list since I'm correcting some errors)
- Le 28 Oct 22, à 3:01, Jarrett Phillips a
écrit :
> Thanks Emmanuel,
> Sorry for the late reply.
> If I understand correctly, there are two options to get the intraspecific and
> interspecific distances from the pairwise distance matrix:
> library(ape)
> library(spider)
> data(anoteropsis) # read in DNA sequences
> anoSpp <- sapply(strsplit(dimnames(anoteropsis)[[1]], split="_"),
> function(x) paste(x[1], x[2], sep="_")) # get specimen labels
Or more simply (see ?stripLabel for the options):
anoSpp <- stripLabel(labels(anoteropsis))
> anoDist <- dist.dna(anoteropsis, model = "raw", pairwise.deletion = FALSE,
> as.matrix = TRUE) # get distance matrix
> x1 <- lower.tri(outer(anoSpp, anoSpp, "=="), diag = FALSE) # I specified lower
> triangular as per your suggestion
> intra1 <- anoDist[x1] # intraspecific distances
> inter1 <- anoDist[!x1] # interspecific distances
This should be:
x1 <- outer(anoSpp, anoSpp, "==")
x1 <- x1 & lower.tri(x1)
intra1 <- anoDist[x1] # intraspecific distances with no duplicate
When trying the 2nd solution, I found out that as.dist() on a logical matrix
converts the values as numeric (TRUE/FALSE become 1/0), so you might do
instead:
d <- dist.dna(anoteropsis, model = "raw")
intra1BIS <- d[which(as.dist(outer(anoSpp, anoSpp, "==")) == 1)]
Checking:
R> identical(intra1, intra1BIS)
[1] TRUE
A simpler solution is:
x1 <- outer(anoSpp, anoSpp, "==")
x1 <- x1[lower.tri(x1)]
intra1TER <- d[x1]
R> identical(intra1, intra1TER)
[1] TRUE
Since 'd' has only the lower triangle of the distance matrix, d[!x1] gives the
inter-species distances. The alternative using the full (square) matrix would
be:
x2 <- outer(anoSpp, anoSpp, "!=")
x2 <- x2 & lower.tri(x2)
anoDist[x2]
Checking:
R> identical(d[!x1], anoDist[x2])
[1] TRUE
Note that if you have a very big data set, you may call outer() only once with
"==" and then do x2 <- !x1; and of course, lower.tri(x1) and lower.tri(x2) are
identical.
> x2 <- as.dist(outer(anoSpp, anoSpp, "=="), diag = FALSE, upper = FALSE) #
> lower
> triangular matrix excluding main diagonal
> intra2 <- anoDist[x2]
> inter2 <- anoDist[!x2]
> However, these implementations do not give the same results.
> Am I missing something? Could you clarify?
> As a side note, the 'spider' R package has the function 'sppDist' that returns
> intraspecific and interspecific distances. However, it differs as well, so
> there is some added confusion.
> x3 <- sppDist(anoDist, anoSpp)
> intra <- x3$intra
> inter <- x4$inter
This gives me:
R> identical(x3$intra, intra1)
[1] TRUE
Be careful that the distances in x3$inter are duplicated:
R> lengths(x3)
inter intra
1024 16
Best,
Emmanuel
> Which implementation is the correct one?
> Cheers.
> Jarrett
> On Tue, Oct 18, 2022 at 11:51 PM Emmanuel Paradis < [
> mailto:emmanuel.para...@ird.fr | emmanuel.para...@ird.fr ] > wrote:
>> Hi Jarrett,
>> If you have a vector of species names (say 'taxa'), then:
>> o <- outer(taxa, taxa, "==")
>> returns a logical matrix with TRUE when the two strings in 'taxa' are the
>> same,
>> FALSE if they are different (so the diagonal is TRUE). Since you return the
>> output of dist.dna() as a matrix, then you can do:
>> diag(o) <- FALSE
>> anoDist[o] # all the intraspecific distances
>> anoDist[!o] # all the interspecific distances
>> You may have also to remove the duplicated distances, maybe with
>> lower.tri(). An
>> alternative is to work with the "dist" object (this will use less memory too)
>> from dist.dna:
>> o <- as.dist(outer(taxa, taxa, "=="))
>> will drop the upper triangle and the diagonal of the logical matrix. The two
>> above commands are the same.
>> After, you can elaborate on this by creating your function(s) if you want to
>> focus on a particular species (or pair of), eg:
>> f <- function(x, y) x == "Homo sapiens" & y == "Homo abilis"
>> o <- as.dist(outer(taxa, taxa, f))
>> See also the function adegeneet::pairDistPlot() which a method for "DNAbin"
>> objects.
>> HTH
>> Best,
>> Emmanuel
>> - Le 19 Oct 22, à 1:16, Jarrett Phillips [
>> mailto:phillipsjarre...@gmail.com
>> | phillipsjarre...@gmail.com ] a écrit :
>> > Hello,
>> > I'm wondering whether someone knows of an efficient way to extract
>> > intraspecific (within-species) and interspecific (between-species) genetic
>> > distances for each species returned by ape::dist.dna() to be able to
>> > calculate the DNA barcode gap.
>> > It seems extracting the entire object as a matrix is a step in the right
>> > direction.
>> > Consider the following example:
>> > library(ape)
>> > library(spider)
>> > data(anoteropsis) # sample dataset
>> > anoDist <- dist.dna(anoteropsis, model = "raw", pairwise.deletion =
>> > FALSE, as.matrix = TRUE) # compute p-distances
>> > Any ideas?
>> > [[alternative HTML version deleted]]
>> > ___
>>>