Re: [R-sig-phylo] A possible alternate MRCA function to APE\'s getMRCA

2017-06-09 Thread Joseph W. Brown
Sweet! I considered preallocation, but figured it was not worth it. Guess I was 
wrong!

JWB

Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 9 Jun, 2017, at 18:28, Klaus Schliep  wrote:
> 
> Hi Joseph
> 
> I guess I figured out where some room for improvement was.
> You grow the vector pars <- c(pars, nd)
> I preallocated pars, but you could even just grow pars using pars[k] <- nd 
> without pre-allocation. This operation is now (with R >= 3.4) far less costly 
> as mentioned in the NEWS:
> "Assigning to an element of a vector beyond the current length now 
> over-allocates by a small fraction. The new vector is marked internally as 
> growable, and the true length of the new vector is stored in the truelength 
> field. This makes building up a vector result by assigning to the next 
> element beyond the current length more efficient, though pre-allocating is 
> still preferred. The implementation is subject to change and not intended to 
> be used in packages at this time." 
> 
> One of the many little improvements recently making R quite a bit faster and 
> easier.
> 
> So the worst case behavior was pretty bad 
> 
> tree <- stree(10, "right")
> system.time(get_mrca_fast(tree, c(1,2)))
>user  system elapsed 
>  13.424   0.012  13.440 
>  mrca
> [1] 19
> 
> with the changes it is much better:
> 
> system.time(mrca <- get_mrca_fast(tree, c(1,2)))
>user  system elapsed 
>   0.012   0.000   0.012 
> > mrca
> [1] 19
> 
> And here is the code:
> 
> get_mrca_fast <- function (phy, tips) {
> rootnd <- length(phy$tip.label) + 1;
> pars <- integer(phy$Nnode);  # worst case assignment, usually far too 
> long 
> mrcaind <- 1;  # index in pars of the mrca node. will return highest 
> one traversed by other lineages
> tnd <- NULL;   # node ids of tips
> if ("character" %in% class(tips)) {
> tnd <- match(tips, phy$tip.label);
> } else {
> tnd <- tips;
> }
> 
> # build a lookup table to get parents faster
> pvec <- integer(max(phy$edge));
> pvec[phy$edge[,2]] <- phy$edge[,1];
> 
> # get entire lineage for first tip
> nd <- tnd[1];
> for(k in 1:phy$Nnode){
> nd <- pvec[nd]
> pars[k] <- nd
> if (nd == rootnd) break
> }
> pars <- pars[1:k] #delete the rest
> 
> # traverse lineages for remaining tips, stop if hit common ancestor
> for (i in 2:length(tnd)) {
> done <- FALSE;
> cnd <- tnd[i];
> while (!done) {
> cpar <- pvec[cnd]; # get immediate parent
> if (cpar %in% pars) {
> if (cpar == rootnd) {
> # early exit! if the mrca of any set of taxa is the root 
> then we are done
> return(rootnd);
> }
> idx <- which(pars == cpar);
> if (idx > mrcaind) {
> mrcaind <- idx;
> }
> done <- TRUE;
> } else {
> # keep going!
> cnd <- cpar;
> }
> }
> }
> return(pars[mrcaind]);
> }
> 
> 
> Have a nice weekend, 
> Klaus
> 
> On Fri, Jun 9, 2017 at 3:59 PM, Joseph W. Brown  > wrote:
> Liam,
> 
> I put the (updated) code up as a gist 
>  >. Feel 
> free to use/improve/whatever. Emmanuel sees room for improvement but frankly 
> taking 0.3 seconds to find the MRCA of 5000 taxa on a million-tip tree is 
> good enough for my purposes.
> 
> JWB
> 
> Joseph W. Brown
> Post-doctoral Researcher, Smith Laboratory
> University of Michigan
> Department of Ecology & Evolutionary Biology
> Room 2071, Kraus Natural Sciences Building
> Ann Arbor MI 48109-1079
> josep...@umich.edu 
> 
> 
> 
> > On 9 Jun, 2017, at 15:20, Liam J. Revell  > > wrote:
> >
> > On the other hand, phytools does have a function - the somewhat imprecisely 
> > named fastMRCA - which can find the MRCA of just a pair of species much 
> > faster than getMRCA (however still slower than or only about as fast as 
> > Joseph & Klaus's solutions).
> >
> > Liam J. Revell, Associate Professor of Biology
> > University of Massachusetts Boston
> > web: http://faculty.umb.edu/liam.revell/ 
> > 
> > email: liam.rev...@umb.edu 
> > blog: http://blog.phytools.org 
> >
> > On 6/9/2017 5:22 AM, Liam J. Revell wrote:
> >>
> >> Juan. findMRCA was written before 

Re: [R-sig-phylo] A possible alternate MRCA function to APE\'s getMRCA

2017-06-09 Thread Klaus Schliep
Hi Joseph

I guess I figured out where some room for improvement was.
You grow the vector pars <- c(pars, nd)
I preallocated pars, but you could even just grow pars using pars[k] <- nd
without pre-allocation. This operation is now (with R >= 3.4) far less
costly as mentioned in the NEWS:
"Assigning to an element of a vector beyond the current length now
over-allocates by a small fraction. The new vector is marked internally as
growable, and the true length of the new vector is stored in the truelength
field. This makes building up a vector result by assigning to the next
element beyond the current length more efficient, though pre-allocating is
still preferred. The implementation is subject to change and not intended
to be used in packages at this time."

One of the many little improvements recently making R quite a bit faster
and easier.

So the worst case behavior was pretty bad

tree <- stree(10, "right")
system.time(get_mrca_fast(tree, c(1,2)))
   user  system elapsed
 13.424   0.012  13.440
 mrca
[1] 19

with the changes it is much better:

system.time(mrca <- get_mrca_fast(tree, c(1,2)))
   user  system elapsed
  0.012   0.000   0.012
> mrca
[1] 19

And here is the code:

get_mrca_fast <- function (phy, tips) {
rootnd <- length(phy$tip.label) + 1;
pars <- integer(phy$Nnode);  # worst case assignment, usually far too
long
mrcaind <- 1;  # index in pars of the mrca node. will return
highest one traversed by other lineages
tnd <- NULL;   # node ids of tips
if ("character" %in% class(tips)) {
tnd <- match(tips, phy$tip.label);
} else {
tnd <- tips;
}

# build a lookup table to get parents faster
pvec <- integer(max(phy$edge));
pvec[phy$edge[,2]] <- phy$edge[,1];

# get entire lineage for first tip
nd <- tnd[1];
for(k in 1:phy$Nnode){
nd <- pvec[nd]
pars[k] <- nd
if (nd == rootnd) break
}
pars <- pars[1:k] #delete the rest

# traverse lineages for remaining tips, stop if hit common ancestor
for (i in 2:length(tnd)) {
done <- FALSE;
cnd <- tnd[i];
while (!done) {
cpar <- pvec[cnd]; # get immediate parent
if (cpar %in% pars) {
if (cpar == rootnd) {
# early exit! if the mrca of any set of taxa is the
root then we are done
return(rootnd);
}
idx <- which(pars == cpar);
if (idx > mrcaind) {
mrcaind <- idx;
}
done <- TRUE;
} else {
# keep going!
cnd <- cpar;
}
}
}
return(pars[mrcaind]);
}


Have a nice weekend,
Klaus

On Fri, Jun 9, 2017 at 3:59 PM, Joseph W. Brown  wrote:

> Liam,
>
> I put the (updated) code up as a gist  ad61fd29ed4fb06e712e23d67422c813>. Feel free to use/improve/whatever.
> Emmanuel sees room for improvement but frankly taking 0.3 seconds to find
> the MRCA of 5000 taxa on a million-tip tree is good enough for my purposes.
>
> JWB
> 
> Joseph W. Brown
> Post-doctoral Researcher, Smith Laboratory
> University of Michigan
> Department of Ecology & Evolutionary Biology
> Room 2071, Kraus Natural Sciences Building
> Ann Arbor MI 48109-1079
> josep...@umich.edu
>
>
>
> > On 9 Jun, 2017, at 15:20, Liam J. Revell  wrote:
> >
> > On the other hand, phytools does have a function - the somewhat
> imprecisely named fastMRCA - which can find the MRCA of just a pair of
> species much faster than getMRCA (however still slower than or only about
> as fast as Joseph & Klaus's solutions).
> >
> > Liam J. Revell, Associate Professor of Biology
> > University of Massachusetts Boston
> > web: http://faculty.umb.edu/liam.revell/
> > email: liam.rev...@umb.edu
> > blog: http://blog.phytools.org
> >
> > On 6/9/2017 5:22 AM, Liam J. Revell wrote:
> >>
> >> Juan. findMRCA was written before getMRCA existed, but the latter was
> >> faster so now it just calls getMRCA internally. All the best, Liam
> >>
> >> Liam J. Revell, Associate Professor of Biology
> >> University of Massachusetts Boston
> >> web: http://faculty.umb.edu/liam.revell/
> >> email: liam.rev...@umb.edu
> >> blog: http://blog.phytools.org
> >>
> >> On 6/9/2017 1:57 AM, Juan Antonio Balbuena wrote:
> >>> Package phytools includes a function, findMRCA, that is supposed to
> work
> >>> very efficiently with large trees. you may wish to compare it with your
> >>> function.
> >>>
> >>> Cheers
> >>>
> >>> Juan
> >>>
> >>> --
> >>>
> >>> Dr. Juan A. Balbuena
> >>> Cavanilles Institute of Biodiversity and Evolutionary Biology
> >>> University of Valencia
> >>> http://www.uv.es/~balbuena 
> >>> P.O. Box 22085
> >>> http://www.uv.es/cophylpaco
> >>> 
> >>> 46071 Valencia, 

Re: [R-sig-phylo] A possible alternate MRCA function to APE\'s getMRCA

2017-06-09 Thread Joseph W. Brown
Liam,

I put the (updated) code up as a gist 
. Feel free 
to use/improve/whatever. Emmanuel sees room for improvement but frankly taking 
0.3 seconds to find the MRCA of 5000 taxa on a million-tip tree is good enough 
for my purposes.

JWB

Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu



> On 9 Jun, 2017, at 15:20, Liam J. Revell  wrote:
> 
> On the other hand, phytools does have a function - the somewhat imprecisely 
> named fastMRCA - which can find the MRCA of just a pair of species much 
> faster than getMRCA (however still slower than or only about as fast as 
> Joseph & Klaus's solutions).
> 
> Liam J. Revell, Associate Professor of Biology
> University of Massachusetts Boston
> web: http://faculty.umb.edu/liam.revell/
> email: liam.rev...@umb.edu
> blog: http://blog.phytools.org
> 
> On 6/9/2017 5:22 AM, Liam J. Revell wrote:
>> 
>> Juan. findMRCA was written before getMRCA existed, but the latter was
>> faster so now it just calls getMRCA internally. All the best, Liam
>> 
>> Liam J. Revell, Associate Professor of Biology
>> University of Massachusetts Boston
>> web: http://faculty.umb.edu/liam.revell/
>> email: liam.rev...@umb.edu
>> blog: http://blog.phytools.org
>> 
>> On 6/9/2017 1:57 AM, Juan Antonio Balbuena wrote:
>>> Package phytools includes a function, findMRCA, that is supposed to work
>>> very efficiently with large trees. you may wish to compare it with your
>>> function.
>>> 
>>> Cheers
>>> 
>>> Juan
>>> 
>>> --
>>> 
>>> Dr. Juan A. Balbuena
>>> Cavanilles Institute of Biodiversity and Evolutionary Biology
>>> University of Valencia
>>> http://www.uv.es/~balbuena 
>>> P.O. Box 22085
>>> http://www.uv.es/cophylpaco
>>> 
>>> 46071 Valencia, Spain
>>> e-mail: j.a.balbu...@uv.es tel. +34 963
>>> 543 658fax +34 963 543 733
>>> 
>>> *NOTE!*For shipments by EXPRESS COURIER use the following street address:
>>> C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain.
>>> 
>>> 
>>> 
>>> 
>>> 
>>>  Libre de virus. www.avast.com
>>> 
>>> 
>>> 
>>> 
>>> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>>> 
>>> 
>>> ___
>>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>>> Searchable archive at
>>> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>>> 
>> 
>> ___
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at
>> http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] A possible alternate MRCA function to APE\'s getMRCA

2017-06-09 Thread Liam J. Revell
On the other hand, phytools does have a function - the somewhat 
imprecisely named fastMRCA - which can find the MRCA of just a pair of 
species much faster than getMRCA (however still slower than or only 
about as fast as Joseph & Klaus's solutions).


Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 6/9/2017 5:22 AM, Liam J. Revell wrote:


Juan. findMRCA was written before getMRCA existed, but the latter was
faster so now it just calls getMRCA internally. All the best, Liam

Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 6/9/2017 1:57 AM, Juan Antonio Balbuena wrote:

Package phytools includes a function, findMRCA, that is supposed to work
very efficiently with large trees. you may wish to compare it with your
function.

Cheers

Juan

--

Dr. Juan A. Balbuena
Cavanilles Institute of Biodiversity and Evolutionary Biology
University of Valencia
http://www.uv.es/~balbuena 
P.O. Box 22085
http://www.uv.es/cophylpaco

46071 Valencia, Spain
e-mail: j.a.balbu...@uv.es tel. +34 963
543 658fax +34 963 543 733

*NOTE!*For shipments by EXPRESS COURIER use the following street address:
C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain.





  Libre de virus. www.avast.com




<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] A possible alternate MRCA function to APE\'s getMRCA

2017-06-09 Thread Liam J. Revell
Juan. findMRCA was written before getMRCA existed, but the latter was 
faster so now it just calls getMRCA internally. All the best, Liam


Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 6/9/2017 1:57 AM, Juan Antonio Balbuena wrote:

Package phytools includes a function, findMRCA, that is supposed to work
very efficiently with large trees. you may wish to compare it with your
function.

Cheers

Juan

--

Dr. Juan A. Balbuena
Cavanilles Institute of Biodiversity and Evolutionary Biology
University of Valencia
http://www.uv.es/~balbuena 
P.O. Box 22085
http://www.uv.es/cophylpaco 
46071 Valencia, Spain
e-mail: j.a.balbu...@uv.es tel. +34 963
543 658fax +34 963 543 733

*NOTE!*For shipments by EXPRESS COURIER use the following street address:
C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain.




Libre de virus. www.avast.com



<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/