Dear Vojtěch,

The performance of calculating Robinson-Foulds topological distances can be
improved using the algorithm of Day (1985), which is implemented in the R
package TreeDist:
https://ms609.github.io/TreeDist/reference/Robinson-Foulds.html
This will often be faster than dist.topo.

(As noted there, the RF distance has certain issues which means that it is
not always the most suitable measure of tree dissimilarity!)

And you can find ape's GitHub repo https://github.com/emmanuelparadis/ape

Martin


--

*Dr. Martin R. Smith*
Associate Professor & Director of Education
Department of Earth Sciences
Durham University
Mountjoy Site, South Road, Durham, DH1 3LE United Kingdom

*T:* +44 (0)191 334 2320 (Tues / Thurs)
*M*: +44 (0)774 353 7510
*E*: martin.sm...@durham.ac.uk
*Zoom room*: durhamuniversity.zoom.us/my/smith

smithlabdurham.github.io
twitter.com/PalaeoSmith

My working days are Monday–Thursday.

The information in this e-mail and any attachments is confidential. It is
intended solely for the addressee or addressees. If you are not the
intended recipient please delete the message and any attachments and notify
the sender of misdelivery. Any use or disclosure of the contents of either
is unauthorised and may be unlawful.
Although steps have been taken to ensure that this e-mail and any
attachments are free from any virus, we advise the recipient to ensure they
are indeed virus free. All liability for viruses is excluded to the fullest
extent permitted by law.


On Tue, 7 Mar 2023 at 11:01, <r-sig-phylo-requ...@r-project.org> wrote:

> [EXTERNAL EMAIL]
>
> Send R-sig-phylo mailing list submissions to
>         r-sig-phylo@r-project.org
>
> Today's Topics:
>
>    1. Parallelization in ape::dist.topo
>       (=?utf-8?B?Vm9qdMSbY2g=?= Zeisek)
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 06 Mar 2023 14:43:04 +0100
> From: =?utf-8?B?Vm9qdMSbY2g=?= Zeisek <vo...@trapa.cz>
> To: mailinglist R <r-sig-phylo@r-project.org>
> Subject: [R-sig-phylo] Parallelization in ape::dist.topo
> Message-ID: <3551602.QO7bkq4lFn@veles>
> Content-Type: text/plain; charset="utf-8"
>
> Hello dear colleagues,
> I use often ape::dist.topo (see here dist.topo.r), which is doing the
> calculations sequentially, which is very slow for large data sets. I'm
> sorry,
> I haven't found any relevant Git repository or so, so I hope Emmanuel won't
> mind if I discuss it here.
> I discussed various options with ChatGPT and dist.topo.par1.r is the
> simplest
> solution, basically using mc.lapply instead of 2 for loops. Good study
> material for how to do it in general. Little enhancements are in
> dist.topo.par2.r, which should be slightly better in case some pair of
> comparisons would return NA or so, but from my tests there doesn't seem to
> be
> any difference.
> And finally there is dist.topo.par3.r which doesn't load parallel (and uses
> plain lapply) for cores==1, while parallel and doParallel for multiple
> cores.
> It also contains some checks and error handling. From my testing it works
> well. I'm not sure if tryCatch is really needed there. In any case,
> improvements welcomed. :-)
> So, what do You think? Is this usable improvement of ape::dist.topo?
> Sincerely,
> V.
>
> --
> Vojtěch Zeisek
> https://trapa.cz/en/
>
> Department of Botany, Faculty of Science
> Charles University, Prague, Czech Republic
> https://www.natur.cuni.cz/biology/botany/
> https://lab-allience.natur.cuni.cz/
>
> Institute of Botany, Czech Academy of Sciences
> Průhonice, Czech Republic
> https://www.ibot.cas.cz/en/
> Computing cluster
> https://sorbus.ibot.cas.cz/en/start
>
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to