Dear Vojtěch, The performance of calculating Robinson-Foulds topological distances can be improved using the algorithm of Day (1985), which is implemented in the R package TreeDist: https://ms609.github.io/TreeDist/reference/Robinson-Foulds.html This will often be faster than dist.topo.
(As noted there, the RF distance has certain issues which means that it is not always the most suitable measure of tree dissimilarity!) And you can find ape's GitHub repo https://github.com/emmanuelparadis/ape Martin -- *Dr. Martin R. Smith* Associate Professor & Director of Education Department of Earth Sciences Durham University Mountjoy Site, South Road, Durham, DH1 3LE United Kingdom *T:* +44 (0)191 334 2320 (Tues / Thurs) *M*: +44 (0)774 353 7510 *E*: martin.sm...@durham.ac.uk *Zoom room*: durhamuniversity.zoom.us/my/smith smithlabdurham.github.io twitter.com/PalaeoSmith My working days are Monday–Thursday. The information in this e-mail and any attachments is confidential. It is intended solely for the addressee or addressees. If you are not the intended recipient please delete the message and any attachments and notify the sender of misdelivery. Any use or disclosure of the contents of either is unauthorised and may be unlawful. Although steps have been taken to ensure that this e-mail and any attachments are free from any virus, we advise the recipient to ensure they are indeed virus free. All liability for viruses is excluded to the fullest extent permitted by law. On Tue, 7 Mar 2023 at 11:01, <r-sig-phylo-requ...@r-project.org> wrote: > [EXTERNAL EMAIL] > > Send R-sig-phylo mailing list submissions to > r-sig-phylo@r-project.org > > Today's Topics: > > 1. Parallelization in ape::dist.topo > (=?utf-8?B?Vm9qdMSbY2g=?= Zeisek) > > ---------------------------------------------------------------------- > > Message: 1 > Date: Mon, 06 Mar 2023 14:43:04 +0100 > From: =?utf-8?B?Vm9qdMSbY2g=?= Zeisek <vo...@trapa.cz> > To: mailinglist R <r-sig-phylo@r-project.org> > Subject: [R-sig-phylo] Parallelization in ape::dist.topo > Message-ID: <3551602.QO7bkq4lFn@veles> > Content-Type: text/plain; charset="utf-8" > > Hello dear colleagues, > I use often ape::dist.topo (see here dist.topo.r), which is doing the > calculations sequentially, which is very slow for large data sets. I'm > sorry, > I haven't found any relevant Git repository or so, so I hope Emmanuel won't > mind if I discuss it here. > I discussed various options with ChatGPT and dist.topo.par1.r is the > simplest > solution, basically using mc.lapply instead of 2 for loops. Good study > material for how to do it in general. Little enhancements are in > dist.topo.par2.r, which should be slightly better in case some pair of > comparisons would return NA or so, but from my tests there doesn't seem to > be > any difference. > And finally there is dist.topo.par3.r which doesn't load parallel (and uses > plain lapply) for cores==1, while parallel and doParallel for multiple > cores. > It also contains some checks and error handling. From my testing it works > well. I'm not sure if tryCatch is really needed there. In any case, > improvements welcomed. :-) > So, what do You think? Is this usable improvement of ape::dist.topo? > Sincerely, > V. > > -- > Vojtěch Zeisek > https://trapa.cz/en/ > > Department of Botany, Faculty of Science > Charles University, Prague, Czech Republic > https://www.natur.cuni.cz/biology/botany/ > https://lab-allience.natur.cuni.cz/ > > Institute of Botany, Czech Academy of Sciences > Průhonice, Czech Republic > https://www.ibot.cas.cz/en/ > Computing cluster > https://sorbus.ibot.cas.cz/en/start > > [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/