Hi Jarrett, (Cc to the list since I'm correcting some errors) ----- Le 28 Oct 22, à 3:01, Jarrett Phillips <phillipsjarre...@gmail.com> a écrit :
> Thanks Emmanuel, > Sorry for the late reply. > If I understand correctly, there are two options to get the intraspecific and > interspecific distances from the pairwise distance matrix: > library(ape) > library(spider) > data(anoteropsis) # read in DNA sequences > anoSpp <- sapply(strsplit(dimnames(anoteropsis)[[1]], split="_"), > function(x) paste(x[1], x[2], sep="_")) # get specimen labels Or more simply (see ?stripLabel for the options): anoSpp <- stripLabel(labels(anoteropsis)) > anoDist <- dist.dna(anoteropsis, model = "raw", pairwise.deletion = FALSE, > as.matrix = TRUE) # get distance matrix > x1 <- lower.tri(outer(anoSpp, anoSpp, "=="), diag = FALSE) # I specified lower > triangular as per your suggestion > intra1 <- anoDist[x1] # intraspecific distances > inter1 <- anoDist[!x1] # interspecific distances This should be: x1 <- outer(anoSpp, anoSpp, "==") x1 <- x1 & lower.tri(x1) intra1 <- anoDist[x1] # intraspecific distances with no duplicate When trying the 2nd solution, I found out that as.dist() on a logical matrix converts the values as numeric (TRUE/FALSE become 1/0), so you might do instead: d <- dist.dna(anoteropsis, model = "raw") intra1BIS <- d[which(as.dist(outer(anoSpp, anoSpp, "==")) == 1)] Checking: R> identical(intra1, intra1BIS) [1] TRUE A simpler solution is: x1 <- outer(anoSpp, anoSpp, "==") x1 <- x1[lower.tri(x1)] intra1TER <- d[x1] R> identical(intra1, intra1TER) [1] TRUE Since 'd' has only the lower triangle of the distance matrix, d[!x1] gives the inter-species distances. The alternative using the full (square) matrix would be: x2 <- outer(anoSpp, anoSpp, "!=") x2 <- x2 & lower.tri(x2) anoDist[x2] Checking: R> identical(d[!x1], anoDist[x2]) [1] TRUE Note that if you have a very big data set, you may call outer() only once with "==" and then do x2 <- !x1; and of course, lower.tri(x1) and lower.tri(x2) are identical. > x2 <- as.dist(outer(anoSpp, anoSpp, "=="), diag = FALSE, upper = FALSE) # > lower > triangular matrix excluding main diagonal > intra2 <- anoDist[x2] > inter2 <- anoDist[!x2] > However, these implementations do not give the same results. > Am I missing something? Could you clarify? > As a side note, the 'spider' R package has the function 'sppDist' that returns > intraspecific and interspecific distances. However, it differs as well, so > there is some added confusion. > x3 <- sppDist(anoDist, anoSpp) > intra <- x3$intra > inter <- x4$inter This gives me: R> identical(x3$intra, intra1) [1] TRUE Be careful that the distances in x3$inter are duplicated: R> lengths(x3) inter intra 1024 16 Best, Emmanuel > Which implementation is the correct one? > Cheers. > Jarrett > On Tue, Oct 18, 2022 at 11:51 PM Emmanuel Paradis < [ > mailto:emmanuel.para...@ird.fr | emmanuel.para...@ird.fr ] > wrote: >> Hi Jarrett, >> If you have a vector of species names (say 'taxa'), then: >> o <- outer(taxa, taxa, "==") >> returns a logical matrix with TRUE when the two strings in 'taxa' are the >> same, >> FALSE if they are different (so the diagonal is TRUE). Since you return the >> output of dist.dna() as a matrix, then you can do: >> diag(o) <- FALSE >> anoDist[o] # all the intraspecific distances >> anoDist[!o] # all the interspecific distances >> You may have also to remove the duplicated distances, maybe with >> lower.tri(). An >> alternative is to work with the "dist" object (this will use less memory too) >> from dist.dna: >> o <- as.dist(outer(taxa, taxa, "==")) >> will drop the upper triangle and the diagonal of the logical matrix. The two >> above commands are the same. >> After, you can elaborate on this by creating your function(s) if you want to >> focus on a particular species (or pair of), eg: >> f <- function(x, y) x == "Homo sapiens" & y == "Homo abilis" >> o <- as.dist(outer(taxa, taxa, f)) >> See also the function adegeneet::pairDistPlot() which a method for "DNAbin" >> objects. >> HTH >> Best, >> Emmanuel >> ----- Le 19 Oct 22, à 1:16, Jarrett Phillips [ >> mailto:phillipsjarre...@gmail.com >> | phillipsjarre...@gmail.com ] a écrit : >> > Hello, >> > I'm wondering whether someone knows of an efficient way to extract >> > intraspecific (within-species) and interspecific (between-species) genetic >> > distances for each species returned by ape::dist.dna() to be able to >> > calculate the DNA barcode gap. >> > It seems extracting the entire object as a matrix is a step in the right >> > direction. >> > Consider the following example: >> > library(ape) >> > library(spider) >> > data(anoteropsis) # sample dataset >> > anoDist <- dist.dna(anoteropsis, model = "raw", pairwise.deletion = >> > FALSE, as.matrix = TRUE) # compute p-distances >> > Any ideas? >> > [[alternative HTML version deleted]] >> > _______________________________________________ >>> R-sig-phylo mailing list - [ mailto:R-sig-phylo@r-project.org | >> > R-sig-phylo@r-project.org ] >>> [ https://stat.ethz.ch/mailman/listinfo/r-sig-phylo | >> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ] >>> Searchable archive at [ >>> http://www.mail-archive.com/r-sig-phylo@r-project.org/ | >> > http://www.mail-archive.com/r-sig-phylo@r-project.org/ ] [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/