Sweet! I considered preallocation, but figured it was not worth it. Guess I was 
wrong!

JWB
________________________________________
Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 9 Jun, 2017, at 18:28, Klaus Schliep <klaus.schl...@gmail.com> wrote:
> 
> Hi Joseph
> 
> I guess I figured out where some room for improvement was.
> You grow the vector pars <- c(pars, nd)
> I preallocated pars, but you could even just grow pars using pars[k] <- nd 
> without pre-allocation. This operation is now (with R >= 3.4) far less costly 
> as mentioned in the NEWS:
> "Assigning to an element of a vector beyond the current length now 
> over-allocates by a small fraction. The new vector is marked internally as 
> growable, and the true length of the new vector is stored in the truelength 
> field. This makes building up a vector result by assigning to the next 
> element beyond the current length more efficient, though pre-allocating is 
> still preferred. The implementation is subject to change and not intended to 
> be used in packages at this time." 
> 
> One of the many little improvements recently making R quite a bit faster and 
> easier.
> 
> So the worst case behavior was pretty bad 
> 
> tree <- stree(100000, "right")
> system.time(get_mrca_fast(tree, c(1,2)))
>    user  system elapsed 
>  13.424   0.012  13.440 
>  mrca
> [1] 199999
> 
> with the changes it is much better:
> 
> system.time(mrca <- get_mrca_fast(tree, c(1,2)))
>    user  system elapsed 
>   0.012   0.000   0.012 
> > mrca
> [1] 199999
> 
> And here is the code:
> 
> get_mrca_fast <- function (phy, tips) {
>     rootnd <- length(phy$tip.label) + 1;
>     pars <- integer(phy$Nnode);  # worst case assignment, usually far too 
> long 
>     mrcaind <- 1;      # index in pars of the mrca node. will return highest 
> one traversed by other lineages
>     tnd <- NULL;       # node ids of tips
>     if ("character" %in% class(tips)) {
>         tnd <- match(tips, phy$tip.label);
>     } else {
>         tnd <- tips;
>     }
>     
>     # build a lookup table to get parents faster
>     pvec <- integer(max(phy$edge));
>     pvec[phy$edge[,2]] <- phy$edge[,1];
>     
>     # get entire lineage for first tip
>     nd <- tnd[1];
>     for(k in 1:phy$Nnode){
>         nd <- pvec[nd]
>         pars[k] <- nd
>         if (nd == rootnd) break
>     }
>     pars <- pars[1:k] #delete the rest
>     
>     # traverse lineages for remaining tips, stop if hit common ancestor
>     for (i in 2:length(tnd)) {
>         done <- FALSE;
>         cnd <- tnd[i];
>         while (!done) {
>             cpar <- pvec[cnd]; # get immediate parent
>             if (cpar %in% pars) {
>                 if (cpar == rootnd) {
>                     # early exit! if the mrca of any set of taxa is the root 
> then we are done
>                     return(rootnd);
>                 }
>                 idx <- which(pars == cpar);
>                 if (idx > mrcaind) {
>                     mrcaind <- idx;
>                 }
>                 done <- TRUE;
>             } else {
>                 # keep going!
>                 cnd <- cpar;
>             }
>         }
>     }
>     return(pars[mrcaind]);
> }
> 
> 
> Have a nice weekend, 
> Klaus
> 
> On Fri, Jun 9, 2017 at 3:59 PM, Joseph W. Brown <josep...@umich.edu 
> <mailto:josep...@umich.edu>> wrote:
> Liam,
> 
> I put the (updated) code up as a gist 
> <https://gist.github.com/josephwb/ad61fd29ed4fb06e712e23d67422c813 
> <https://gist.github.com/josephwb/ad61fd29ed4fb06e712e23d67422c813>>. Feel 
> free to use/improve/whatever. Emmanuel sees room for improvement but frankly 
> taking 0.3 seconds to find the MRCA of 5000 taxa on a million-tip tree is 
> good enough for my purposes.
> 
> JWB
> ________________________________________
> Joseph W. Brown
> Post-doctoral Researcher, Smith Laboratory
> University of Michigan
> Department of Ecology & Evolutionary Biology
> Room 2071, Kraus Natural Sciences Building
> Ann Arbor MI 48109-1079
> josep...@umich.edu <mailto:josep...@umich.edu>
> 
> 
> 
> > On 9 Jun, 2017, at 15:20, Liam J. Revell <liam.rev...@umb.edu 
> > <mailto:liam.rev...@umb.edu>> wrote:
> >
> > On the other hand, phytools does have a function - the somewhat imprecisely 
> > named fastMRCA - which can find the MRCA of just a pair of species much 
> > faster than getMRCA (however still slower than or only about as fast as 
> > Joseph & Klaus's solutions).
> >
> > Liam J. Revell, Associate Professor of Biology
> > University of Massachusetts Boston
> > web: http://faculty.umb.edu/liam.revell/ 
> > <http://faculty.umb.edu/liam.revell/>
> > email: liam.rev...@umb.edu <mailto:liam.rev...@umb.edu>
> > blog: http://blog.phytools.org <http://blog.phytools.org/>
> >
> > On 6/9/2017 5:22 AM, Liam J. Revell wrote:
> >>
> >> Juan. findMRCA was written before getMRCA existed, but the latter was
> >> faster so now it just calls getMRCA internally. All the best, Liam
> >>
> >> Liam J. Revell, Associate Professor of Biology
> >> University of Massachusetts Boston
> >> web: http://faculty.umb.edu/liam.revell/ 
> >> <http://faculty.umb.edu/liam.revell/>
> >> email: liam.rev...@umb.edu <mailto:liam.rev...@umb.edu>
> >> blog: http://blog.phytools.org <http://blog.phytools.org/>
> >>
> >> On 6/9/2017 1:57 AM, Juan Antonio Balbuena wrote:
> >>> Package phytools includes a function, findMRCA, that is supposed to work
> >>> very efficiently with large trees. you may wish to compare it with your
> >>> function.
> >>>
> >>> Cheers
> >>>
> >>> Juan
> >>>
> >>> --
> >>>
> >>> Dr. Juan A. Balbuena
> >>> Cavanilles Institute of Biodiversity and Evolutionary Biology
> >>> University of Valencia
> >>> http://www.uv.es/~balbuena <http://www.uv.es/~balbuena> 
> >>> <http://www.uv.es/%7Ebalbuena <http://www.uv.es/%7Ebalbuena>>
> >>> P.O. Box 22085
> >>> http://www.uv.es/cophylpaco <http://www.uv.es/cophylpaco>
> >>> <http://www.uv.es/cavanilles/zoomarin/index.htm 
> >>> <http://www.uv.es/cavanilles/zoomarin/index.htm>>
> >>> 46071 Valencia, Spain
> >>> e-mail: j.a.balbu...@uv.es <mailto:j.a.balbu...@uv.es> 
> >>> <mailto:j.a.balbu...@uv.es <mailto:j.a.balbu...@uv.es>>    tel. +34 963
> >>> 543 658    fax +34 963 543 733 <tel:%2B34%20963%20543%20733>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> *NOTE!*For shipments by EXPRESS COURIER use the following street address:
> >>> C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain.
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>
> >>>
> >>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient
> >>>  
> >>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>>
> >>>
> >>>      Libre de virus. www.avast.com <http://www.avast.com/>
> >>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient
> >>>  
> >>> <https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=emailclient>>
> >>>
> >>>
> >>>
> >>> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
> >>>
> >>>
> >>> _______________________________________________
> >>> R-sig-phylo mailing list - R-sig-phylo@r-project.org 
> >>> <mailto:R-sig-phylo@r-project.org>
> >>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo 
> >>> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
> >>> Searchable archive at
> >>> http://www.mail-archive.com/r-sig-phylo@r-project.org/ 
> >>> <http://www.mail-archive.com/r-sig-phylo@r-project.org/>
> >>>
> >>
> >> _______________________________________________
> >> R-sig-phylo mailing list - R-sig-phylo@r-project.org 
> >> <mailto:R-sig-phylo@r-project.org>
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo 
> >> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
> >> Searchable archive at
> >> http://www.mail-archive.com/r-sig-phylo@r-project.org/ 
> >> <http://www.mail-archive.com/r-sig-phylo@r-project.org/>
> 
> 
>         [[alternative HTML version deleted]]
> 
> _______________________________________________
> R-sig-phylo mailing list - R-sig-phylo@r-project.org 
> <mailto:R-sig-phylo@r-project.org>
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo 
> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ 
> <http://www.mail-archive.com/r-sig-phylo@r-project.org/>
> 
> 
> -- 
> Klaus Schliep
> Postdoctoral Fellow
> Revell Lab, University of Massachusetts Boston
> http://www.phangorn.org/ <http://www.phangorn.org/>

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Reply via email to