Hello, thank You for Your comments, dear Klaus. Dne úterý 7. března 2023 13:01:07 CET, Klaus Schliep napsal(a): > Dear Vojtěch, > nice work. Just a few random comments: > Parallelization is often not straightforward as it > depends on the hardware and the operating system.
Yes. E.g. it typically relies on fork(), which is unavailable on Windows. > My preference is using the future package for parallelization > as it does some nice abstraction for the different R packages, > so you can try different things and no need to write different > versions of the code. chatGPT seems to have missed this. I was looking at future.apply package, but it seemed unnecessarily complex to me. I'm bit biased as I hardly meet anyone with working parallelization (greetings to MPI) on Windows and on Linux "plan" mostly works well. > Defaulting to "cores = detectCores()" is not a good idea. The CRAN > Repository Policy (https://cran.r-project.org/web/packages/policies.html) > states: "If running a package uses multiple threads/cores it must never > use more than two simultaneously: the check farm is a shared resource > and will typically be running many checks simultaneously." In short such > a default can cause serious problems on a cluster and I got properly > ripleyed for this years ago, but that's another story. So the default > should be something like min(2L, detectCores()). That's good point, thank You. > So with great power comes great responsibility > and users should read the man pages. > Kind regards, > Klaus > > On Mon, Mar 6, 2023 at 2:43 PM Vojtěch Zeisek wrote: > > Hello dear colleagues, > > I use often ape::dist.topo (see here dist.topo.r), which is doing the > > calculations sequentially, which is very slow for large data sets. > > What is large? Large number of trees or large trees? Let's say up to ~1000 trees with 100+ tips. Then dist.topo runs for hours. > > I'm sorry, I haven't found any relevant Git repository or so, so > > I hope Emmanuel won't mind if I discuss it here. I discussed > > various options with ChatGPT and dist.topo.par1.r is the simplest > > solution, basically using mc.lapply instead of 2 for loops. Good > > study material for how to do it in general. Little enhancements > > are in dist.topo.par2.r, which should be slightly better in case > > some pair of comparisons would return NA or so, but from my tests > > there doesn't seem to beany difference. > > I think there is room for improvement with preprocessing trees. If > you have e.g. bootstrap trees with short edges you might want to > use di2multi() first to avoid spurious differences. Filtering out > duplicated trees could also speed up things, if there are many. > This will depend on the trees you want to compare and the methods. Could be, but not every distance method works well with polytomies. > > And finally there is dist.topo.par3.r which doesn't load parallel > > (and uses plain lapply) for cores==1, while parallel and doParallel > > for multiple cores. It also contains some checks and error handling. > > From my testing it works well. I'm not sure if tryCatch is really > > needed there. > > I am not sure if it is necessary, but you > don't want to have it in the inner loop ;) It's not first time I see such construction, I already used it like that (also in different cases) and so far so good. :-) > > In any case, improvements welcomed. :-) > > So, what do You think? Is this usable improvement of ape::dist.topo? Thank You, V. -- Vojtěch Zeisek https://trapa.cz/en/ Department of Botany, Faculty of Science Charles University, Prague, Czech Republic https://www.natur.cuni.cz/biology/botany/ https://lab-allience.natur.cuni.cz/ Institute of Botany, Czech Academy of Sciences Průhonice, Czech Republic https://www.ibot.cas.cz/en/ Computing cluster https://sorbus.ibot.cas.cz/en/start
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/