Hi Jarrett, 
(Cc to the list since I'm correcting some errors) 

----- Le 28 Oct 22, à 3:01, Jarrett Phillips <phillipsjarre...@gmail.com> a 
écrit : 

> Thanks Emmanuel,
> Sorry for the late reply.

> If I understand correctly, there are two options to get the intraspecific and
> interspecific distances from the pairwise distance matrix:

> library(ape)
> library(spider)

> data(anoteropsis) # read in DNA sequences

> anoSpp <- sapply(strsplit(dimnames(anoteropsis)[[1]], split="_"),
> function(x) paste(x[1], x[2], sep="_")) # get specimen labels

Or more simply (see ?stripLabel for the options): 

anoSpp <- stripLabel(labels(anoteropsis)) 

> anoDist <- dist.dna(anoteropsis, model = "raw", pairwise.deletion = FALSE,
> as.matrix = TRUE) # get distance matrix

> x1 <- lower.tri(outer(anoSpp, anoSpp, "=="), diag = FALSE) # I specified lower
> triangular as per your suggestion
> intra1 <- anoDist[x1] # intraspecific distances
> inter1 <- anoDist[!x1] # interspecific distances

This should be: 

x1 <- outer(anoSpp, anoSpp, "==") 
x1 <- x1 & lower.tri(x1) 
intra1 <- anoDist[x1] # intraspecific distances with no duplicate 

When trying the 2nd solution, I found out that as.dist() on a logical matrix 
converts the values as numeric (TRUE/FALSE become 1/0), so you might do 
instead: 

d <- dist.dna(anoteropsis, model = "raw") 
intra1BIS <- d[which(as.dist(outer(anoSpp, anoSpp, "==")) == 1)] 

Checking: 

R> identical(intra1, intra1BIS) 
[1] TRUE 

A simpler solution is: 

x1 <- outer(anoSpp, anoSpp, "==") 
x1 <- x1[lower.tri(x1)] 
intra1TER <- d[x1] 

R> identical(intra1, intra1TER) 
[1] TRUE 

Since 'd' has only the lower triangle of the distance matrix, d[!x1] gives the 
inter-species distances. The alternative using the full (square) matrix would 
be: 

x2 <- outer(anoSpp, anoSpp, "!=") 
x2 <- x2 & lower.tri(x2) 
anoDist[x2] 

Checking: 

R> identical(d[!x1], anoDist[x2]) 
[1] TRUE 

Note that if you have a very big data set, you may call outer() only once with 
"==" and then do x2 <- !x1; and of course, lower.tri(x1) and lower.tri(x2) are 
identical. 

> x2 <- as.dist(outer(anoSpp, anoSpp, "=="), diag = FALSE, upper = FALSE) # 
> lower
> triangular matrix excluding main diagonal
> intra2 <- anoDist[x2]
> inter2 <- anoDist[!x2]

> However, these implementations do not give the same results.

> Am I missing something? Could you clarify?

> As a side note, the 'spider' R package has the function 'sppDist' that returns
> intraspecific and interspecific distances. However, it differs as well, so
> there is some added confusion.

> x3 <- sppDist(anoDist, anoSpp)
> intra <- x3$intra
> inter <- x4$inter

This gives me: 

R> identical(x3$intra, intra1) 
[1] TRUE 

Be careful that the distances in x3$inter are duplicated: 

R> lengths(x3) 
inter intra 
1024 16 

Best, 

Emmanuel 

> Which implementation is the correct one?

> Cheers.

> Jarrett

> On Tue, Oct 18, 2022 at 11:51 PM Emmanuel Paradis < [
> mailto:emmanuel.para...@ird.fr | emmanuel.para...@ird.fr ] > wrote:

>> Hi Jarrett,

>> If you have a vector of species names (say 'taxa'), then:

>> o <- outer(taxa, taxa, "==")

>> returns a logical matrix with TRUE when the two strings in 'taxa' are the 
>> same,
>> FALSE if they are different (so the diagonal is TRUE). Since you return the
>> output of dist.dna() as a matrix, then you can do:

>> diag(o) <- FALSE
>> anoDist[o] # all the intraspecific distances
>> anoDist[!o] # all the interspecific distances

>> You may have also to remove the duplicated distances, maybe with 
>> lower.tri(). An
>> alternative is to work with the "dist" object (this will use less memory too)
>> from dist.dna:

>> o <- as.dist(outer(taxa, taxa, "=="))

>> will drop the upper triangle and the diagonal of the logical matrix. The two
>> above commands are the same.

>> After, you can elaborate on this by creating your function(s) if you want to
>> focus on a particular species (or pair of), eg:

>> f <- function(x, y) x == "Homo sapiens" & y == "Homo abilis"
>> o <- as.dist(outer(taxa, taxa, f))

>> See also the function adegeneet::pairDistPlot() which a method for "DNAbin"
>> objects.

>> HTH

>> Best,

>> Emmanuel

>> ----- Le 19 Oct 22, à 1:16, Jarrett Phillips [ 
>> mailto:phillipsjarre...@gmail.com
>> | phillipsjarre...@gmail.com ] a écrit :

>> > Hello,

>> > I'm wondering whether someone knows of an efficient way to extract
>> > intraspecific (within-species) and interspecific (between-species) genetic
>> > distances for each species returned by ape::dist.dna() to be able to
>> > calculate the DNA barcode gap.

>> > It seems extracting the entire object as a matrix is a step in the right
>> > direction.

>> > Consider the following example:

>> > library(ape)
>> > library(spider)

>> > data(anoteropsis) # sample dataset
>> > anoDist <- dist.dna(anoteropsis, model = "raw", pairwise.deletion =
>> > FALSE, as.matrix = TRUE) # compute p-distances

>> > Any ideas?

>> > [[alternative HTML version deleted]]

>> > _______________________________________________
>>> R-sig-phylo mailing list - [ mailto:R-sig-phylo@r-project.org |
>> > R-sig-phylo@r-project.org ]
>>> [ https://stat.ethz.ch/mailman/listinfo/r-sig-phylo |
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ]
>>> Searchable archive at [ 
>>> http://www.mail-archive.com/r-sig-phylo@r-project.org/ |
>> > http://www.mail-archive.com/r-sig-phylo@r-project.org/ ]

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to