Dear Roi,

I recently implemented a somewhat clunky workaround to allow the analysis
of trees with >2048 tips; see instructions at
https://ms609.github.io/TreeDist/reference/TreeDistance.html#distances-between-large-trees

Calculation of distances between large trees will be slow; distance
calculations are roughly quadratic on number of leaves, and the number of
distances to calculate is quadratic on the number of trees.  It may thus be
wise to subsample your treeset to minimise the number of tree-to-tree
comparisons involved.
Replacing any cherries (pairs of leaves) that are identical between all
trees – or replacing species with genera – might make the calculations more
tractable, but it would be important to think through the mathematics of
how this might change the emphasis of your chosen distance measures.

The other important question is what you are seeking to accomplish with
your analysis of tree space.  Could this be accomplished, perhaps with more
statistical rigour or with less computational cost, by a different approach
– for example by running a cluster analysis on a subset of trees then
assigning other trees to their closest cluster; or computing each tree's
distance from the median of a representative subset?

- Martin


--

*Dr. Martin R. Smith*
Associate Professor & Director of Education
Department of Earth Sciences
Durham University
Mountjoy Site, South Road, Durham, DH1 3LE United Kingdom

*T:* +44 (0)191 334 2320 (Tues / Thurs)
*M*: +44 (0)774 353 7510
*E*: martin.sm...@durham.ac.uk
*Zoom room*: durhamuniversity.zoom.us/my/smith

smithlabdurham.github.io
twitter.com/PalaeoSmith

My working days are Monday–Thursday.

The information in this e-mail and any attachments is confidential. It is
intended solely for the addressee or addressees. If you are not the
intended recipient please delete the message and any attachments and notify
the sender of misdelivery. Any use or disclosure of the contents of either
is unauthorised and may be unlawful.
Although steps have been taken to ensure that this e-mail and any
attachments are free from any virus, we advise the recipient to ensure they
are indeed virus free. All liability for viruses is excluded to the fullest
extent permitted by law.


On Mon, 1 Jan 2024 at 18:54, roee maor <roeem...@gmail.com> wrote:

> *[EXTERNAL EMAIL]*
> Hi Martin and list members,
> Happy New Year.
>
> I'm trying to explore the phylogenetic tree space represented by a few
> thousands trees of ~2500 tips.
>
> The packages TreeTools and TreeDist, which hold the most advanced R
> functionality for such analysis (AFAIK) are limited to dealing with trees
> smaller than 500 tips, presumably due to memory requirements.
>
> So my first questions is: what could I do to bypass this limitation?
>
> Second, are there alternative packages for robust quantification of
> phylogenetic space (i.e. not Robinson-Foulds,  see Smith 2022:
> https://academic.oup.com/sysbio/article/71/5/1255/6486431)?
>
> My fallback plan is either (1) to trim the trees to genus level and make
> do from there, or (2) split the trees taxonomically and somehow tally up
> the results from each sub-tree. Perhaps even better would be to do (2) and
> then prune out clades that show little between-tree variability before
> doing (1).
> Any comments on this plan are more than welcome. My current reviewers
> didn't respond well to "was not feasible due to computational
> limitations"...
>
> Many thanks in advance for any help or advice!
> Very best wishes for the new year,
> Roi
>
>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to