Hi Joseph

I guess I figured out where some room for improvement was.
You grow the vector pars <- c(pars, nd)
I preallocated pars, but you could even just grow pars using pars[k] <- nd
without pre-allocation. This operation is now (with R >= 3.4) far less
costly as mentioned in the NEWS:
"Assigning to an element of a vector beyond the current length now
over-allocates by a small fraction. The new vector is marked internally as
growable, and the true length of the new vector is stored in the truelength
field. This makes building up a vector result by assigning to the next
element beyond the current length more efficient, though pre-allocating is
still preferred. The implementation is subject to change and not intended
to be used in packages at this time."

One of the many little improvements recently making R quite a bit faster
and easier.

So the worst case behavior was pretty bad

tree <- stree(100000, "right")
system.time(get_mrca_fast(tree, c(1,2)))
   user  system elapsed
 13.424   0.012  13.440
 mrca
[1] 199999

with the changes it is much better:

system.time(mrca <- get_mrca_fast(tree, c(1,2)))
   user  system elapsed
  0.012   0.000   0.012
> mrca
[1] 199999

And here is the code:

get_mrca_fast <- function (phy, tips) {
    rootnd <- length(phy$tip.label) + 1;
    pars <- integer(phy$Nnode);  # worst case assignment, usually far too
long
    mrcaind <- 1;      # index in pars of the mrca node. will return
highest one traversed by other lineages
    tnd <- NULL;       # node ids of tips
    if ("character" %in% class(tips)) {
        tnd <- match(tips, phy$tip.label);
    } else {
        tnd <- tips;
    }

    # build a lookup table to get parents faster
    pvec <- integer(max(phy$edge));
    pvec[phy$edge[,2]] <- phy$edge[,1];

    # get entire lineage for first tip
    nd <- tnd[1];
    for(k in 1:phy$Nnode){
        nd <- pvec[nd]
        pars[k] <- nd
        if (nd == rootnd) break
    }
    pars <- pars[1:k] #delete the rest

    # traverse lineages for remaining tips, stop if hit common ancestor
    for (i in 2:length(tnd)) {
        done <- FALSE;
        cnd <- tnd[i];
        while (!done) {
            cpar <- pvec[cnd]; # get immediate parent
            if (cpar %in% pars) {
                if (cpar == rootnd) {
                    # early exit! if the mrca of any set of taxa is the
root then we are done
                    return(rootnd);
                }
                idx <- which(pars == cpar);
                if (idx > mrcaind) {
                    mrcaind <- idx;
                }
                done <- TRUE;
            } else {
                # keep going!
                cnd <- cpar;
            }
        }
    }
    return(pars[mrcaind]);
}


Have a nice weekend,
Klaus

On Fri, Jun 9, 2017 at 3:59 PM, Joseph W. Brown <josep...@umich.edu> wrote:

> Liam,
>
> I put the (updated) code up as a gist <https://gist.github.com/josephwb/
> ad61fd29ed4fb06e712e23d67422c813>. Feel free to use/improve/whatever.
> Emmanuel sees room for improvement but frankly taking 0.3 seconds to find
> the MRCA of 5000 taxa on a million-tip tree is good enough for my purposes.
>
> JWB
> ________________________________________
> Joseph W. Brown
> Post-doctoral Researcher, Smith Laboratory
> University of Michigan
> Department of Ecology & Evolutionary Biology
> Room 2071, Kraus Natural Sciences Building
> Ann Arbor MI 48109-1079
> josep...@umich.edu
>
>
>
> > On 9 Jun, 2017, at 15:20, Liam J. Revell <liam.rev...@umb.edu> wrote:
> >
> > On the other hand, phytools does have a function - the somewhat
> imprecisely named fastMRCA - which can find the MRCA of just a pair of
> species much faster than getMRCA (however still slower than or only about
> as fast as Joseph & Klaus's solutions).
> >
> > Liam J. Revell, Associate Professor of Biology
> > University of Massachusetts Boston
> > web: http://faculty.umb.edu/liam.revell/
> > email: liam.rev...@umb.edu
> > blog: http://blog.phytools.org
> >
> > On 6/9/2017 5:22 AM, Liam J. Revell wrote:
> >>
> >> Juan. findMRCA was written before getMRCA existed, but the latter was
> >> faster so now it just calls getMRCA internally. All the best, Liam
> >>
> >> Liam J. Revell, Associate Professor of Biology
> >> University of Massachusetts Boston
> >> web: http://faculty.umb.edu/liam.revell/
> >> email: liam.rev...@umb.edu
> >> blog: http://blog.phytools.org
> >>
> >> On 6/9/2017 1:57 AM, Juan Antonio Balbuena wrote:
> >>> Package phytools includes a function, findMRCA, that is supposed to
> work
> >>> very efficiently with large trees. you may wish to compare it with your
> >>> function.
> >>>
> >>> Cheers
> >>>
> >>> Juan
> >>>
> >>> --
> >>>
> >>> Dr. Juan A. Balbuena
> >>> Cavanilles Institute of Biodiversity and Evolutionary Biology
> >>> University of Valencia
> >>> http://www.uv.es/~balbuena <http://www.uv.es/%7Ebalbuena>
> >>> P.O. Box 22085
> >>> http://www.uv.es/cophylpaco
> >>> <http://www.uv.es/cavanilles/zoomarin/index.htm>
> >>> 46071 Valencia, Spain
> >>> e-mail: j.a.balbu...@uv.es <mailto:j.a.balbu...@uv.es>    tel. +34 963
> >>> 543 658    fax +34 963 543 733
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> *NOTE!*For shipments by EXPRESS COURIER use the following street
> address:
> >>> C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain.
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>
> >>>
> >>> <https://www.avast.com/sig-email?utm_medium=email&utm_
> source=link&utm_campaign=sig-email&utm_content=emailclient>
> >>>
> >>>      Libre de virus. www.avast.com
> >>> <https://www.avast.com/sig-email?utm_medium=email&utm_
> source=link&utm_campaign=sig-email&utm_content=emailclient>
> >>>
> >>>
> >>>
> >>> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> >>>
> >>>
> >>> _______________________________________________
> >>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> >>> Searchable archive at
> >>> http://www.mail-archive.com/r-sig-phylo@r-project.org/
> >>>
> >>
> >> _______________________________________________
> >> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> >> Searchable archive at
> >> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>
>
>         [[alternative HTML version deleted]]
>
> _______________________________________________
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-
> sig-ph...@r-project.org/
>



-- 
Klaus Schliep
Postdoctoral Fellow
Revell Lab, University of Massachusetts Boston
http://www.phangorn.org/

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to