Re: [R-sig-phylo] Extracting intraspecific and interspecific distances from dist.dna() object

2022-11-01 Thread Emmanuel Paradis
Hi Jarrett, 
(Cc to the list since I'm correcting some errors) 

- Le 28 Oct 22, à 3:01, Jarrett Phillips  a 
écrit : 

> Thanks Emmanuel,
> Sorry for the late reply.

> If I understand correctly, there are two options to get the intraspecific and
> interspecific distances from the pairwise distance matrix:

> library(ape)
> library(spider)

> data(anoteropsis) # read in DNA sequences

> anoSpp <- sapply(strsplit(dimnames(anoteropsis)[[1]], split="_"),
> function(x) paste(x[1], x[2], sep="_")) # get specimen labels

Or more simply (see ?stripLabel for the options): 

anoSpp <- stripLabel(labels(anoteropsis)) 

> anoDist <- dist.dna(anoteropsis, model = "raw", pairwise.deletion = FALSE,
> as.matrix = TRUE) # get distance matrix

> x1 <- lower.tri(outer(anoSpp, anoSpp, "=="), diag = FALSE) # I specified lower
> triangular as per your suggestion
> intra1 <- anoDist[x1] # intraspecific distances
> inter1 <- anoDist[!x1] # interspecific distances

This should be: 

x1 <- outer(anoSpp, anoSpp, "==") 
x1 <- x1 & lower.tri(x1) 
intra1 <- anoDist[x1] # intraspecific distances with no duplicate 

When trying the 2nd solution, I found out that as.dist() on a logical matrix 
converts the values as numeric (TRUE/FALSE become 1/0), so you might do 
instead: 

d <- dist.dna(anoteropsis, model = "raw") 
intra1BIS <- d[which(as.dist(outer(anoSpp, anoSpp, "==")) == 1)] 

Checking: 

R> identical(intra1, intra1BIS) 
[1] TRUE 

A simpler solution is: 

x1 <- outer(anoSpp, anoSpp, "==") 
x1 <- x1[lower.tri(x1)] 
intra1TER <- d[x1] 

R> identical(intra1, intra1TER) 
[1] TRUE 

Since 'd' has only the lower triangle of the distance matrix, d[!x1] gives the 
inter-species distances. The alternative using the full (square) matrix would 
be: 

x2 <- outer(anoSpp, anoSpp, "!=") 
x2 <- x2 & lower.tri(x2) 
anoDist[x2] 

Checking: 

R> identical(d[!x1], anoDist[x2]) 
[1] TRUE 

Note that if you have a very big data set, you may call outer() only once with 
"==" and then do x2 <- !x1; and of course, lower.tri(x1) and lower.tri(x2) are 
identical. 

> x2 <- as.dist(outer(anoSpp, anoSpp, "=="), diag = FALSE, upper = FALSE) # 
> lower
> triangular matrix excluding main diagonal
> intra2 <- anoDist[x2]
> inter2 <- anoDist[!x2]

> However, these implementations do not give the same results.

> Am I missing something? Could you clarify?

> As a side note, the 'spider' R package has the function 'sppDist' that returns
> intraspecific and interspecific distances. However, it differs as well, so
> there is some added confusion.

> x3 <- sppDist(anoDist, anoSpp)
> intra <- x3$intra
> inter <- x4$inter

This gives me: 

R> identical(x3$intra, intra1) 
[1] TRUE 

Be careful that the distances in x3$inter are duplicated: 

R> lengths(x3) 
inter intra 
1024 16 

Best, 

Emmanuel 

> Which implementation is the correct one?

> Cheers.

> Jarrett

> On Tue, Oct 18, 2022 at 11:51 PM Emmanuel Paradis < [
> mailto:emmanuel.para...@ird.fr | emmanuel.para...@ird.fr ] > wrote:

>> Hi Jarrett,

>> If you have a vector of species names (say 'taxa'), then:

>> o <- outer(taxa, taxa, "==")

>> returns a logical matrix with TRUE when the two strings in 'taxa' are the 
>> same,
>> FALSE if they are different (so the diagonal is TRUE). Since you return the
>> output of dist.dna() as a matrix, then you can do:

>> diag(o) <- FALSE
>> anoDist[o] # all the intraspecific distances
>> anoDist[!o] # all the interspecific distances

>> You may have also to remove the duplicated distances, maybe with 
>> lower.tri(). An
>> alternative is to work with the "dist" object (this will use less memory too)
>> from dist.dna:

>> o <- as.dist(outer(taxa, taxa, "=="))

>> will drop the upper triangle and the diagonal of the logical matrix. The two
>> above commands are the same.

>> After, you can elaborate on this by creating your function(s) if you want to
>> focus on a particular species (or pair of), eg:

>> f <- function(x, y) x == "Homo sapiens" & y == "Homo abilis"
>> o <- as.dist(outer(taxa, taxa, f))

>> See also the function adegeneet::pairDistPlot() which a method for "DNAbin"
>> objects.

>> HTH

>> Best,

>> Emmanuel

>> - Le 19 Oct 22, à 1:16, Jarrett Phillips [ 
>> mailto:phillipsjarre...@gmail.com
>> | phillipsjarre...@gmail.com ] a écrit :

>> > Hello,

>> > I'm wondering whether someone knows of an efficient way to extract
>> > intraspecific (within-species) and interspecific (between-species) genetic
>> > distances for each species returned by ape::dist.dna() to be able to
>> > calculate the DNA barcode gap.

>> > It seems extracting the entire object as a matrix is a step in the right
>> > direction.

>> > Consider the following example:

>> > library(ape)
>> > library(spider)

>> > data(anoteropsis) # sample dataset
>> > anoDist <- dist.dna(anoteropsis, model = "raw", pairwise.deletion =
>> > FALSE, as.matrix = TRUE) # compute p-distances

>> > Any ideas?

>> > [[alternative HTML version deleted]]

>> > ___
>>> 

Re: [R-sig-phylo] Extracting intraspecific and interspecific distances from dist.dna() object

2022-10-18 Thread Emmanuel Paradis
Hi Jarrett,

If you have a vector of species names (say 'taxa'), then:

o <- outer(taxa, taxa, "==")

returns a logical matrix with TRUE when the two strings in 'taxa' are the same, 
FALSE if they are different (so the diagonal is TRUE). Since you return the 
output of dist.dna() as a matrix, then you can do:

diag(o) <- FALSE
anoDist[o] # all the intraspecific distances
anoDist[!o] # all the interspecific distances

You may have also to remove the duplicated distances, maybe with lower.tri(). 
An alternative is to work with the "dist" object (this will use less memory 
too) from dist.dna:

o <- as.dist(outer(taxa, taxa, "=="))

will drop the upper triangle and the diagonal of the logical matrix. The two 
above commands are the same.

After, you can elaborate on this by creating your function(s) if you want to 
focus on a particular species (or pair of), eg:

f <- function(x, y) x == "Homo sapiens" & y == "Homo abilis"
o <- as.dist(outer(taxa, taxa, f))

See also the function adegeneet::pairDistPlot() which a method for "DNAbin" 
objects.

HTH

Best,

Emmanuel

- Le 19 Oct 22, à 1:16, Jarrett Phillips phillipsjarre...@gmail.com a écrit 
:

> Hello,
> 
> I'm wondering whether someone knows of an efficient way to extract
> intraspecific (within-species) and interspecific (between-species) genetic
> distances for each species returned by ape::dist.dna() to be able to
> calculate the DNA barcode gap.
> 
> It seems extracting the entire object as a matrix is a step in the right
> direction.
> 
> Consider the following example:
> 
>library(ape)
>library(spider)
> 
>data(anoteropsis) # sample dataset
>anoDist <- dist.dna(anoteropsis, model = "raw", pairwise.deletion =
> FALSE, as.matrix = TRUE) # compute p-distances
> 
> Any ideas?
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] Extracting intraspecific and interspecific distances from dist.dna() object

2022-10-18 Thread Jarrett Phillips
Hello,

I'm wondering whether someone knows of an efficient way to extract
intraspecific (within-species) and interspecific (between-species) genetic
distances for each species returned by ape::dist.dna() to be able to
calculate the DNA barcode gap.

It seems extracting the entire object as a matrix is a step in the right
direction.

Consider the following example:

library(ape)
library(spider)

data(anoteropsis) # sample dataset
anoDist <- dist.dna(anoteropsis, model = "raw", pairwise.deletion =
FALSE, as.matrix = TRUE) # compute p-distances

Any ideas?

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/