Dear Roi, I recently implemented a somewhat clunky workaround to allow the analysis of trees with >2048 tips; see instructions at https://ms609.github.io/TreeDist/reference/TreeDistance.html#distances-between-large-trees
Calculation of distances between large trees will be slow; distance calculations are roughly quadratic on number of leaves, and the number of distances to calculate is quadratic on the number of trees. It may thus be wise to subsample your treeset to minimise the number of tree-to-tree comparisons involved. Replacing any cherries (pairs of leaves) that are identical between all trees – or replacing species with genera – might make the calculations more tractable, but it would be important to think through the mathematics of how this might change the emphasis of your chosen distance measures. The other important question is what you are seeking to accomplish with your analysis of tree space. Could this be accomplished, perhaps with more statistical rigour or with less computational cost, by a different approach – for example by running a cluster analysis on a subset of trees then assigning other trees to their closest cluster; or computing each tree's distance from the median of a representative subset? - Martin -- *Dr. Martin R. Smith* Associate Professor & Director of Education Department of Earth Sciences Durham University Mountjoy Site, South Road, Durham, DH1 3LE United Kingdom *T:* +44 (0)191 334 2320 (Tues / Thurs) *M*: +44 (0)774 353 7510 *E*: martin.sm...@durham.ac.uk *Zoom room*: durhamuniversity.zoom.us/my/smith smithlabdurham.github.io twitter.com/PalaeoSmith My working days are Monday–Thursday. The information in this e-mail and any attachments is confidential. It is intended solely for the addressee or addressees. If you are not the intended recipient please delete the message and any attachments and notify the sender of misdelivery. Any use or disclosure of the contents of either is unauthorised and may be unlawful. Although steps have been taken to ensure that this e-mail and any attachments are free from any virus, we advise the recipient to ensure they are indeed virus free. All liability for viruses is excluded to the fullest extent permitted by law. On Mon, 1 Jan 2024 at 18:54, roee maor <roeem...@gmail.com> wrote: > *[EXTERNAL EMAIL]* > Hi Martin and list members, > Happy New Year. > > I'm trying to explore the phylogenetic tree space represented by a few > thousands trees of ~2500 tips. > > The packages TreeTools and TreeDist, which hold the most advanced R > functionality for such analysis (AFAIK) are limited to dealing with trees > smaller than 500 tips, presumably due to memory requirements. > > So my first questions is: what could I do to bypass this limitation? > > Second, are there alternative packages for robust quantification of > phylogenetic space (i.e. not Robinson-Foulds, see Smith 2022: > https://academic.oup.com/sysbio/article/71/5/1255/6486431)? > > My fallback plan is either (1) to trim the trees to genus level and make > do from there, or (2) split the trees taxonomically and somehow tally up > the results from each sub-tree. Perhaps even better would be to do (2) and > then prune out clades that show little between-tree variability before > doing (1). > Any comments on this plan are more than welcome. My current reviewers > didn't respond well to "was not feasible due to computational > limitations"... > > Many thanks in advance for any help or advice! > Very best wishes for the new year, > Roi > > [[alternative HTML version deleted]] _______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/