Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Russell Engelman
Dear Everyone,

So this is really strange. I tried what Dr. Revell suggested, and for the
two examples I gave it worked for *Cercopithecus albogularis*, but when I
applied it to *Cricetomys ansorgei *I got the message...

"Error in bind.tree(tree, tip, where = where, position = pp) :
  'position' is larger than the branch length"

I know that usually this means the new node position is usually further
back in time than the point where the branch diverges from its nearest
neighbor. However when I went to check how long the terminal branch leading
to *C. ansorgei*'s sister taxon that I am trying to bind it to is, *C.
emini*, it said that the branch length to this taxon is much, much greater
than the branch length of the new taxon I am trying to set (3.28 Ma), so it
is not clear why it is returning this error message. Here is some code that
shows this with the data set I attached.

```
nodes<-sapply(hbltree$tree_3084$tip.label,function(x,y)
which(y==x),y=hbltree$tree_3084$tip.label)
edge.lengths<-setNames(hbltree$tree_3084$edge.length[sapply(nodes,
   function(x,y)
which(y==x),y=hbltree$tree_3084$edge[,2])],names(nodes))
edge.lengths[c("Cricetomys_emini","Cercopithecus_mitis")]
```

When I do this with my own dataset I get an edge length of 8.588. I tried
this with another tree (tree_7180) to see if things were varying between
trees and I got a branch length of 7.760.

The only thing I can think of is this is a result of the divergence time
varying among the trees, but in that case I am not sure of how to bind a
tip to these trees given I have the divergence data from the literature but
the Upham et al. dataset may not be calibrated to account for it. I.e., the
Upham et al. supertree is estimating divergence time in *Cricetomys *spp.
based on the available data for the species included: *C. emini *and *C.
gambianus*, and when additional constraining data from *C. ansorgei *is
included it might produce a different estimate of branch lengths.

Additionally, is there a way to do this if there is no branch length known?
I have a number of taxa for which divergence times have not been published,
but for whom their phylogenetic position is known. I am trying to just
stick them in the tree as close as possible. I thought it should be
possible if I just delete the `position` and `edge.length` arguments, as I
thought it said in the `ape` documentation that doing so just create a
polytomy or something like that at the split, but when I tried that with my
data I got the message...

"Error in while (currnode != rt) { : missing value where TRUE/FALSE needed"

Which suggests I am missing a key parameter somewhere or misusing the
bind.tip function somehow.

Sincerely,
Russell

On Fri, Jun 4, 2021 at 11:39 AM Mike Collyer  wrote:

> Hello Everyone,
>
> Just to stack onto Liam’s “fun with apply family of functions”, I often
> use lapply, which works well by having a function that only applies to the
> objects in a list, but often — and for difficult to decipher reasons —
> breaks down if the function is not predefined, is complex, has a few
> arguments to vary, or might require several steps.  As a way to deal with
> this kind of issue, I often define a new list for lapply and avoid using
> the list I want to work as the target.  Here is an attempt to show how that
> can be done (but I think Liam’s suggestion is actually better)
>
> n <- length(tree)
>
> newTree <- lapply(1:n, function(j){
>   tree.j <- tree[[j]]
>   bind.tip(tree.j, bind.tip, tip.label="Cercopithecus_albogularis",
>   position=0.59, edge.length = 0.59,
>   where=mrca(tree.j)["Cercopithecus_mitis","Cercopithecus_mitis"])
> })
>
>
> Another way this could be done is to make multiple lists and use the Map
> function, which is basically a modification of the mapply example Liam
> used.  (This example also uses Liam’s suggestion to use fastMRCA.)
>
> tip.label <- lapply(1:n, function(.) "Cercopithecus_albogularis")
> MRCA <- lapply(1:n, function(j) fastMRCA(tree[[j]],
> "Cercopithecus_albogularis", "Cercopithecus_albogularis"))
>
> newTree <- Map(function(tr, tl, m) bind.tip(tr, tl,
>position=0.59, edge.length = 0.59,
>where = m),
>tree,
>tip.label,
>MRCA
>   )
>
> Finally, the do.call function can be helpful when repeating a function
> that has only one or few arguments among several changing over a list.
> Building on previous set up, one could do this
>
> bind.args <- list(tree = tree[[1]], tip.label =
> “Cercopithecus_albogularis”,
>   position=0.59, edge.length = 0.59, where = MRCA[[1]],
>   interactive = FALSE)
>
> newTree <- lapply(1:n, function(j){
>   bind.args$tree <- tree[[j]]
>   bind.args$where <- MRCA[[j]]
>   do.call(bind.tip, bind.args)
> })
>
>
> I also suggest these without verification with real data.
>
> Cheers!
> Mike
>
>
> > On Jun 2, 2021, at 9:11 PM, Liam J. Revell  wrote:
> >
> > Dear 

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Liam J. Revell



Dear Russell.

For some reason, my emails don't seem to be getting through to the whole 
list -- so my apologies if this ends up being redundant with another reply.


In short, if you supply a vector of tip labels that are identical (e.g., 
c("Cricetomys_emini","Cricetomys_emini")), ape::getMRCA returns the 
parent node of the corresponding taxon -- rather than the tip index. I 
didn't know that, but now I do.


You can switch to using phytools::fastMRCA. fastMRCA takes two tip label 
arguments instead of one -- so in your case the function call would be 
fastMRCA(tree,"Cricetomys_emini","Cricetomys_emini").


You can also just use the base R function which. In this case, you would 
do which(tree$tip.label=="Cricetomys_emini"). This should have the same 
effect.


You do seem to be using the arguments 'where' and 'position' correctly 
in phytools::bind.tip. 'where' should point to the node immediately 
*above* (i.e., tipward) of the place you want to attach the new tip; 
while 'position' should be the depth *below* that node.


Here's an update to your mapply call using fastMRCA instead of getMRCA.

newtree<-mapply(bind.tip,tree=tree,where=lapply(tree,fastMRCA,
sp1="Cercopithecus_mitis",sp2="Cercopithecus_mitis"),
MoreArgs=list(tip.label="Cercopithecus_albogularis",
position=0.59,edge.length=0.59),SIMPLIFY=FALSE)
class(newtree)<-"multiPhylo"

Please let me (& the list) know if this solution works as expected.

All the best, Liam

Liam J. Revell
University of Massachusetts Boston [Assoc. Prof.]
Universidad Católica de la Ssma Concepción [Adj. Res.]

Web & phytools:
http://faculty.umb.edu/liam.revell/, http://www.phytools.org, 
http://blog.phytools.org


Academic Director UMass Boston Chile Abroad:
https://www.umb.edu/academics/caps/international/biology_chile

U.S. COVID-19 explorer web application:
https://covid19-explorer.org/

On 6/4/2021 1:17 AM, Russell Engelman wrote:

EXTERNAL SENDER
Dear Everyone,

I think these are all really good suggestions. I tried this with my data 
(specifically the mapply with getMRCA) and it sort of worked. Sort of in 
that the mapply did apply the transformation to all 100 trees, but I 
noticed that after I applied it I got some funky results. Here is 
another example with /Cricetomys ansorgei /and some associated code...


hbltree <-read.nexus(file=" hbltree .nex")
class(hbltree)<-"multiPhylo"
newtrees<-mapply(bind.tip,tree=hbltree,where=lapply(hbltree,getMRCA,
   
  tip=c("Cricetomys_emini","Cricetomys_emini")),

                  MoreArgs=list(tip.label="Cricetomys_ansorgei",
                      position=3.282,edge.length=3.282),SIMPLIFY=FALSE)

However I noticed that when I printed the phylogeny the function for 
some reason does not place /C. ansorgei /as the sister taxon to /C. 
emini/, instead it places it as sister to /C. emini/ + /C. gambianus. 
/That part I am not sure why this is because getMRCA should find the 
terminal tip of /C. emini/, given that it is finding the MRCA of the 
same taxon.


It also does not place /C. ansorgei /at the tip of the tree. This might 
be a side effect of the function selecting the wrong node to insert the 
taxon on, but I think part of it as well is that I'm not familiar with 
how to handle the position and edge.length arguments in this case. I 
know from consulting the primary literature that /C. ansorgei /is 
recoded to have diverged from its nearest sister taxon, /C. emini/, 
about 3.282 Ma. So I know the 'edge.length' argument should be set to 
3.282. I've often struggled a lot with how to efficiently set the 
'position' argument afterwards.


Is it possible I am calling the wrong node or something to insert the 
taxon at?


Sincerely,
Russell

On Thu, Jun 3, 2021 at 10:47 AM Liam J. Revell > wrote:


Dear Russell et al. (sorry if this is a re-send: my message yesterday
seems to have been blocked by the listserv)

Using a for loop is a great idea! Highly underrated in R, IMO.

However, for future reference, the reason that your code didn't work
with lapply is because the list you're 'applying' over (tree) also
appears among the arguments!

If you want to use apply-family functions instead of a for loop (just,
say, for fun) then you have two basic options: you can write a custom
function; or you can use mapply.

Here's some (untested) code to do it.

## first, using a custom function & lapply:
foo<-function(tree) bind.tip(tree,
      tip.label="Cercopithecus_albogularis",
      position=0.59,edge.length=0.59,
      where=getMRCA(tree,tip=c("Cercopithecus_mitis",
      "Cercopithecus_mitis")))
newtree<-lapply(tree,foo)
class(newtree)<-"multiPhylo"

## now, using mapply:
newtree<-mapply(bind.tip,tree=tree,where=lapply(tree,getMRCA,
      tip=c("Cercopithecus_mitis","Cercopithecus_mitis")),
      MoreArgs=list(tip.label="Cercopithecus_albogularis",
   

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Mike Collyer


Hello Everyone,

Just to stack onto Liam’s “fun with apply family of functions”, I often use 
lapply, which works well by having a function that only applies to the objects 
in a list, but often — and for difficult to decipher reasons — breaks down if 
the function is not predefined, is complex, has a few arguments to vary, or 
might require several steps.  As a way to deal with this kind of issue, I often 
define a new list for lapply and avoid using the list I want to work as the 
target.  Here is an attempt to show how that can be done (but I think Liam’s 
suggestion is actually better)

n <- length(tree)

newTree <- lapply(1:n, function(j){
  tree.j <- tree[[j]]
  bind.tip(tree.j, bind.tip, tip.label="Cercopithecus_albogularis",
  position=0.59, edge.length = 0.59,  
  where=mrca(tree.j)["Cercopithecus_mitis","Cercopithecus_mitis"])
})


Another way this could be done is to make multiple lists and use the Map 
function, which is basically a modification of the mapply example Liam used.  
(This example also uses Liam’s suggestion to use fastMRCA.)

tip.label <- lapply(1:n, function(.) "Cercopithecus_albogularis")
MRCA <- lapply(1:n, function(j) fastMRCA(tree[[j]], 
"Cercopithecus_albogularis", "Cercopithecus_albogularis"))

newTree <- Map(function(tr, tl, m) bind.tip(tr, tl, 
   position=0.59, edge.length = 0.59,  
   where = m),
   tree,
   tip.label,
   MRCA
  )

Finally, the do.call function can be helpful when repeating a function that has 
only one or few arguments among several changing over a list.  Building on 
previous set up, one could do this

bind.args <- list(tree = tree[[1]], tip.label = “Cercopithecus_albogularis”,
  position=0.59, edge.length = 0.59, where = MRCA[[1]],
  interactive = FALSE)

newTree <- lapply(1:n, function(j){
  bind.args$tree <- tree[[j]]
  bind.args$where <- MRCA[[j]]
  do.call(bind.tip, bind.args)
})


I also suggest these without verification with real data.

Cheers!
Mike


> On Jun 2, 2021, at 9:11 PM, Liam J. Revell  wrote:
> 
> Dear Russell et al.
> 
> Using a for loop is a great idea! Highly underrated in R, IMO. ;)
> 
> However, for future reference, the reason that your code didn't work with 
> lapply is because the list you're 'applying' over (tree) also appears among 
> the arguments!
> 
> If you want to use apply-family functions instead of a for loop (just, say, 
> for fun) then you have two basic options: you can write a custom function; or 
> you can use mapply.
> 
> Here's some (untested) code to do it.
> 
> ## first, using a custom function & lapply:
> foo<-function(tree) bind.tip(tree,
>   tip.label="Cercopithecus_albogularis",
>   position=0.59,edge.length=0.59,
>   where=getMRCA(tree,tip=c("Cercopithecus_mitis",
>   "Cercopithecus_mitis")))
> newtree<-lapply(tree,foo)
> class(newtree)<-"multiPhylo"
> 
> ## now, using mapply:
> newtree<-mapply(bind.tip,tree=tree,where=lapply(tree,getMRCA,
>   tip=c("Cercopithecus_mitis","Cercopithecus_mitis")),
>   MoreArgs=list(tip.label="Cercopithecus_albogularis",
>   position=0.59,edge.length=0.59),SIMPLIFY=FALSE)
> class(newtree)<-"multiPhylo"
> 
> (Code is not guaranteed! I don't have the data file, so I didn't actually 
> test it -- but something like this ought to work.)
> 
> Regardless, I recommend using ape::getMRCA (or phytools::fastMRCA) because 
> otherwise you're computing an N x N matrix in each iteration of your function 
> call just to get one node index.
> 
> Good luck! All the best, Liam
> 
> Liam J. Revell
> University of Massachusetts Boston [Assoc. Prof.]
> Universidad Católica de la Ssma Concepción [Adj. Res.]
> 
> Web & phytools:
> http://faculty.umb.edu/liam.revell/, http://www.phytools.org, 
> http://blog.phytools.org
> 
> Academic Director UMass Boston Chile Abroad:
> https://www.umb.edu/academics/caps/international/biology_chile
> 
> U.S. COVID-19 explorer web application:
> https://covid19-explorer.org/
> 
> On 6/2/2021 8:18 PM, Nathan Upham wrote:
>> EXTERNAL SENDER
>> Hi Russell:
>> Glad to hear you’re using the VertLife mammal trees — they are built on a 
>> taxonomy of 5,911 species of which only 4,098 are sampled for DNA, so there 
>> is already a ~30% chunk that is placed using taxonomic constraints and 
>> birth-death branch lengths as sampled during the estimation of 28 Bayesian 
>> patch clades.
>> Adding additional species described since the 2015 cutoff of that VertLife 
>> taxonomy makes sense (e.g., up to ~6,500 species on mammaldiversity.org).  
>> However, keep in mind that they will not have birth-death estimated branch 
>> lengths, but rather more likely be added as a polygamy to given clade and 
>> then randomly resolved.
>> Given the sample code you provided, the key thing you’ll want to do is run a 
>> *loop* rather than using lapply, so that you can specify a given tree each 
>> time, e.g.:
>> newtrees<-vector(“list”,length(trees))
>> for(j in 

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Russell Engelman
Dear Everyone,

So this is really strange. I tried what Dr. Revell suggested, and for the
two examples I gave it worked for *Cercopithecus albogularis*, but when I
applied it to *Cricetomys ansorgei *I got the message...

"Error in bind.tree(tree, tip, where = where, position = pp) :
  'position' is larger than the branch length"

I know that usually this means the new node position is usually further
back in time than the point where the branch diverges from its nearest
neighbor. However when I went to check how long the terminal branch leading
to *C. ansorgei*'s sister taxon that I am trying to bind it to is, *C.
emini*, it said that the branch length to this taxon is much, much greater
than the branch length of the new taxon I am trying to set (3.28 Ma), so it
is not clear why it is returning this error message. Here is some code that
shows this with the data set I attached.

```
nodes<-sapply(hbltree$tree_3084$tip.label,function(x,y)
which(y==x),y=hbltree$tree_3084$tip.label)
edge.lengths<-setNames(hbltree$tree_3084$edge.length[sapply(nodes,
   function(x,y)
which(y==x),y=hbltree$tree_3084$edge[,2])],names(nodes))
edge.lengths[c("Cricetomys_emini","Cercopithecus_mitis")]
```

When I do this with my own dataset I get an edge length of 8.588. I tried
this with another tree (tree_7180) to see if things were varying between
trees and I got a branch length of 7.760.

The only thing I can think of is this is a result of the divergence time
varying among the trees, but in that case I am not sure of how to bind a
tip to these trees given I have the divergence data from the literature but
the Upham et al. dataset may not be calibrated to account for it. I.e., the
Upham et al. supertree is estimating divergence time in *Cricetomys *spp.
based on the available data for the species included: *C. emini *and *C.
gambianus*, and when additional constraining data from *C. ansorgei *is
included it might produce a different estimate of branch lengths.

Additionally, is there a way to do this if there is no branch length known?
I have a number of taxa for which divergence times have not been published,
but for whom their phylogenetic position is known. I am trying to just
stick them in the tree as close as possible. I thought it should be
possible if I just delete the `position` and `edge.length` arguments, as I
thought it said in the `ape` documentation that doing so just create a
polytomy or something like that at the split, but when I tried that with my
data I got the message...

"Error in while (currnode != rt) { : missing value where TRUE/FALSE needed"

Which suggests I am missing a key parameter somewhere or misusing the
bind.tip function somehow.

Sincerely,
Russell

On Fri, Jun 4, 2021 at 11:39 AM Mike Collyer  wrote:

> Hello Everyone,
>
> Just to stack onto Liam’s “fun with apply family of functions”, I often
> use lapply, which works well by having a function that only applies to the
> objects in a list, but often — and for difficult to decipher reasons —
> breaks down if the function is not predefined, is complex, has a few
> arguments to vary, or might require several steps.  As a way to deal with
> this kind of issue, I often define a new list for lapply and avoid using
> the list I want to work as the target.  Here is an attempt to show how that
> can be done (but I think Liam’s suggestion is actually better)
>
> n <- length(tree)
>
> newTree <- lapply(1:n, function(j){
>   tree.j <- tree[[j]]
>   bind.tip(tree.j, bind.tip, tip.label="Cercopithecus_albogularis",
>   position=0.59, edge.length = 0.59,
>   where=mrca(tree.j)["Cercopithecus_mitis","Cercopithecus_mitis"])
> })
>
>
> Another way this could be done is to make multiple lists and use the Map
> function, which is basically a modification of the mapply example Liam
> used.  (This example also uses Liam’s suggestion to use fastMRCA.)
>
> tip.label <- lapply(1:n, function(.) "Cercopithecus_albogularis")
> MRCA <- lapply(1:n, function(j) fastMRCA(tree[[j]],
> "Cercopithecus_albogularis", "Cercopithecus_albogularis"))
>
> newTree <- Map(function(tr, tl, m) bind.tip(tr, tl,
>position=0.59, edge.length = 0.59,
>where = m),
>tree,
>tip.label,
>MRCA
>   )
>
> Finally, the do.call function can be helpful when repeating a function
> that has only one or few arguments among several changing over a list.
> Building on previous set up, one could do this
>
> bind.args <- list(tree = tree[[1]], tip.label =
> “Cercopithecus_albogularis”,
>   position=0.59, edge.length = 0.59, where = MRCA[[1]],
>   interactive = FALSE)
>
> newTree <- lapply(1:n, function(j){
>   bind.args$tree <- tree[[j]]
>   bind.args$where <- MRCA[[j]]
>   do.call(bind.tip, bind.args)
> })
>
>
> I also suggest these without verification with real data.
>
> Cheers!
> Mike
>
>
> > On Jun 2, 2021, at 9:11 PM, Liam J. Revell  wrote:
> >
> > Dear 

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Eliot Miller
Russell,

If you're trying to do this across a lot of species, you're going to bump
into a lot of complexities like this. I got the impression you were trying
to do this for many species. For example, there's no reason that the taxon
you're adding is, even if we assume monophyly of the clade (genus in this
example) in question, nested somewhere in the crown clade. It might be
sister to the clade. So you should theoretically be considering that node
as a possible place to add the taxon. Adding the taxon in a polytomy at the
MRCA and then randomly resolving that should do that (I assume, though do
those operations also consider moving stemwards?). You can also randomly
choose a random valid node that doesn't render the clade polyphyletic,
split on that node and use a birth-death model, or any number of other
rules (midpoint, polytomy, uniform distribution) to decide how to divvy up
branch lengths and keep the tree ultrametric. This is what's happening with
addTaxa. Also, if you hit a situation where you're trying to add a taxon to
a set of taxa that are not actually monophyletic, you'll get weird results
with your current approach. If that's not your situation, no problems
there, but you ought to check (or just use addTaxa which will make sure it
doesn't make the clade any "more" polyphyletic, whatever that means). FWIW,
I'm leery of using birth-death models to add lots of missing species
because I think it biases the resulting tree towards increasing
diversification in the present. At least it does with addTaxa. This might
be a programming error on my part, but I actually suspect it's a systematic
bias introduced by problems described in recent papers by Stilianos Louca
and Matthew Pennell, e.g.
https://www.sciencedirect.com/science/article/abs/pii/S0960982221006138?dgcid=coauthor
Figuring out if incorporating new birth-death estimators (that don't cap
extinction at 0) could resolve some of that bias was on my to-do list for
addTaxa, and is why I've never gotten around to more formally publishing
tests of it. Also, I'm fairly sure Jonathan Chang et al. wrote a relevant R
package here which also automates most of this, and used it for a big fish
phylogeny. That'd be worth a look if you're trying to do this across
hundreds or thousands of species.

Eliot

On Fri, Jun 4, 2021 at 2:59 PM Russell Engelman 
wrote:

> Dear Everyone,
>
> So this is really strange. I tried what Dr. Revell suggested, and for the
> two examples I gave it worked for *Cercopithecus albogularis*, but when I
> applied it to *Cricetomys ansorgei *I got the message...
>
> "Error in bind.tree(tree, tip, where = where, position = pp) :
>   'position' is larger than the branch length"
>
> I know that usually this means the new node position is usually further
> back in time than the point where the branch diverges from its nearest
> neighbor. However when I went to check how long the terminal branch leading
> to *C. ansorgei*'s sister taxon that I am trying to bind it to is, *C.
> emini*, it said that the branch length to this taxon is much, much
> greater than the branch length of the new taxon I am trying to set (3.28
> Ma), so it is not clear why it is returning this error message. Here is
> some code that shows this with the data set I attached.
>
> ```
> nodes<-sapply(hbltree$tree_3084$tip.label,function(x,y)
> which(y==x),y=hbltree$tree_3084$tip.label)
> edge.lengths<-setNames(hbltree$tree_3084$edge.length[sapply(nodes,
>function(x,y)
> which(y==x),y=hbltree$tree_3084$edge[,2])],names(nodes))
> edge.lengths[c("Cricetomys_emini","Cercopithecus_mitis")]
> ```
>
> When I do this with my own dataset I get an edge length of 8.588. I tried
> this with another tree (tree_7180) to see if things were varying between
> trees and I got a branch length of 7.760.
>
> The only thing I can think of is this is a result of the divergence time
> varying among the trees, but in that case I am not sure of how to bind a
> tip to these trees given I have the divergence data from the literature but
> the Upham et al. dataset may not be calibrated to account for it. I.e., the
> Upham et al. supertree is estimating divergence time in *Cricetomys *spp.
> based on the available data for the species included: *C. emini *and *C.
> gambianus*, and when additional constraining data from *C. ansorgei *is
> included it might produce a different estimate of branch lengths.
>
> Additionally, is there a way to do this if there is no branch length
> known? I have a number of taxa for which divergence times have not been
> published, but for whom their phylogenetic position is known. I am trying
> to just stick them in the tree as close as possible. I thought it should be
> possible if I just delete the `position` and `edge.length` arguments, as I
> thought it said in the `ape` documentation that doing so just create a
> polytomy or something like that at the split, but when I tried that with my
> data I got the message...
>
> 

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Liam J. Revell



Dear Russell et al. (sorry if this is a re-send: my message yesterday 
seems to have been blocked by the listserv)


Using a for loop is a great idea! Highly underrated in R, IMO.

However, for future reference, the reason that your code didn't work 
with lapply is because the list you're 'applying' over (tree) also 
appears among the arguments!


If you want to use apply-family functions instead of a for loop (just, 
say, for fun) then you have two basic options: you can write a custom 
function; or you can use mapply.


Here's some (untested) code to do it.

## first, using a custom function & lapply:
foo<-function(tree) bind.tip(tree,
tip.label="Cercopithecus_albogularis",
position=0.59,edge.length=0.59,
where=getMRCA(tree,tip=c("Cercopithecus_mitis",
"Cercopithecus_mitis")))
newtree<-lapply(tree,foo)
class(newtree)<-"multiPhylo"

## now, using mapply:
newtree<-mapply(bind.tip,tree=tree,where=lapply(tree,getMRCA,
tip=c("Cercopithecus_mitis","Cercopithecus_mitis")),
MoreArgs=list(tip.label="Cercopithecus_albogularis",
position=0.59,edge.length=0.59),SIMPLIFY=FALSE)
class(newtree)<-"multiPhylo"

(Code is not guaranteed! I don't have the data file, so I didn't 
actually test it -- but something like this ought to work.)


Regardless, I recommend using ape::getMRCA (or phytools::fastMRCA) 
because otherwise you're computing an N x N matrix in each iteration of 
your function call just to get one node index.


Good luck! All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
Universidad Católica de la Ssma Concepción

Web & phytools:
http://faculty.umb.edu/liam.revell/, http://www.phytools.org, 
http://blog.phytools.org


Academic Director UMass Boston Chile Abroad:
https://www.umb.edu/academics/caps/international/biology_chile

U.S. COVID-19 explorer web application:
https://covid19-explorer.org/

On 6/2/2021 8:18 PM, Nathan Upham wrote:

Hi Russell:

Glad to hear you’re using the VertLife mammal trees — they are built on a 
taxonomy of 5,911 species of which only 4,098 are sampled for DNA, so there is 
already a ~30% chunk that is placed using taxonomic constraints and birth-death 
branch lengths as sampled during the estimation of 28 Bayesian patch clades.

Adding additional species described since the 2015 cutoff of that VertLife 
taxonomy makes sense (e.g., up to ~6,500 species on mammaldiversity.org).  
However, keep in mind that they will not have birth-death estimated branch 
lengths, but rather more likely be added as a polygamy to given clade and then 
randomly resolved.

Given the sample code you provided, the key thing you’ll want to do is run a 
*loop* rather than using lapply, so that you can specify a given tree each 
time, e.g.:

newtrees<-vector(“list”,length(trees))
for(j in 1:length(trees)){
newtrees[[j]] <- bind.tip(tree=trees[[j]], tip.label="Cercopithecus_albogularis”, 
position=0.59,edge.length = 0.59, 
where=mrca(tree[[j]])["Cercopithecus_mitis","Cercopithecus_mitis"])
}

I also wrote some code to prune mammal trees and add extinct Caribbean species, 
which uses a similar approach of making polytomies and randomly resolving them 
— here is the repo:
https://github.com/n8upham/CaribbeanExtinctions-WTWTW/tree/master/mamPhy_pruningCode
And here is the code file:
https://github.com/n8upham/CaribbeanExtinctions-WTWTW/blob/master/mamPhy_pruningCode/pruningCode_MamPhy-to-CaribbeanTaxa.R

Hope that helps,
—nate




Nathan S. Upham, Ph.D. (he/him)
Assistant Research Professor & Associate Curator of Mammals
Arizona State University, School of Life Sciences
  ~> Check out the new Mammal Tree of Life  
and the Mammal Diversity Database 

Research Associate, Yale University (Ecology and Evolutionary Biology)
Research Associate, Field Museum of Natural History (Negaunee Integrative 
Research Center)
Chair, Biodiversity Committee, American Society of Mammalogists
Taxonomy Advisor, IUCN/SSC Small Mammal Specialist Group

personal web: n8u.org | Google Scholar 

 | ASU profile 
e: nathan.up...@asu.edu | Skype: nate_upham | Twitter: @n8_upham 






On Jun 2, 2021, at 4:19 PM, Eliot Miller  wrote:

Hi Russell,

A package I wrote a while back should be able to do that fairly easily.
https://urldefense.com/v3/__https://github.com/eliotmiller/addTaxa__;!!IKRxdwAv5BmarQ!OZj7-dFRbxvUothKjSj6hr9B0eXscAO6LVWi1-a0w3J_PxlDqvsFDNb0lQrzxl2aIw$
  The only paper it's described in
remains 

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Liam J. Revell

Dear Russell et al.

Using a for loop is a great idea! Highly underrated in R, IMO. ;)

However, for future reference, the reason that your code didn't work 
with lapply is because the list you're 'applying' over (tree) also 
appears among the arguments!


If you want to use apply-family functions instead of a for loop (just, 
say, for fun) then you have two basic options: you can write a custom 
function; or you can use mapply.


Here's some (untested) code to do it.

## first, using a custom function & lapply:
foo<-function(tree) bind.tip(tree,
tip.label="Cercopithecus_albogularis",
position=0.59,edge.length=0.59,
where=getMRCA(tree,tip=c("Cercopithecus_mitis",
"Cercopithecus_mitis")))
newtree<-lapply(tree,foo)
class(newtree)<-"multiPhylo"

## now, using mapply:
newtree<-mapply(bind.tip,tree=tree,where=lapply(tree,getMRCA,
tip=c("Cercopithecus_mitis","Cercopithecus_mitis")),
MoreArgs=list(tip.label="Cercopithecus_albogularis",
position=0.59,edge.length=0.59),SIMPLIFY=FALSE)
class(newtree)<-"multiPhylo"

(Code is not guaranteed! I don't have the data file, so I didn't 
actually test it -- but something like this ought to work.)


Regardless, I recommend using ape::getMRCA (or phytools::fastMRCA) 
because otherwise you're computing an N x N matrix in each iteration of 
your function call just to get one node index.


Good luck! All the best, Liam

Liam J. Revell
University of Massachusetts Boston [Assoc. Prof.]
Universidad Católica de la Ssma Concepción [Adj. Res.]

Web & phytools:
http://faculty.umb.edu/liam.revell/, http://www.phytools.org, 
http://blog.phytools.org


Academic Director UMass Boston Chile Abroad:
https://www.umb.edu/academics/caps/international/biology_chile

U.S. COVID-19 explorer web application:
https://covid19-explorer.org/

On 6/2/2021 8:18 PM, Nathan Upham wrote:

EXTERNAL SENDER

Hi Russell:

Glad to hear you’re using the VertLife mammal trees — they are built on a 
taxonomy of 5,911 species of which only 4,098 are sampled for DNA, so there is 
already a ~30% chunk that is placed using taxonomic constraints and birth-death 
branch lengths as sampled during the estimation of 28 Bayesian patch clades.

Adding additional species described since the 2015 cutoff of that VertLife 
taxonomy makes sense (e.g., up to ~6,500 species on mammaldiversity.org).  
However, keep in mind that they will not have birth-death estimated branch 
lengths, but rather more likely be added as a polygamy to given clade and then 
randomly resolved.

Given the sample code you provided, the key thing you’ll want to do is run a 
*loop* rather than using lapply, so that you can specify a given tree each 
time, e.g.:

newtrees<-vector(“list”,length(trees))
for(j in 1:length(trees)){
newtrees[[j]] <- bind.tip(tree=trees[[j]], tip.label="Cercopithecus_albogularis”, 
position=0.59,edge.length = 0.59, 
where=mrca(tree[[j]])["Cercopithecus_mitis","Cercopithecus_mitis"])
}

I also wrote some code to prune mammal trees and add extinct Caribbean species, 
which uses a similar approach of making polytomies and randomly resolving them 
— here is the repo:
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fn8upham%2FCaribbeanExtinctions-WTWTW%2Ftree%2Fmaster%2FmamPhy_pruningCodedata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=KeZWNETOkPBekL3j5AIr2hygW49PbdSKImMV39QTXtE%3Dreserved=0
And here is the code file:
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fn8upham%2FCaribbeanExtinctions-WTWTW%2Fblob%2Fmaster%2FmamPhy_pruningCode%2FpruningCode_MamPhy-to-CaribbeanTaxa.Rdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=10u8o7U8dqC1pyK6GteNycoSACTOXAO2VaHNJmTeddk%3Dreserved=0

Hope that helps,
—nate




Nathan S. Upham, Ph.D. (he/him)
Assistant Research Professor & Associate Curator of Mammals
Arizona State University, School of Life Sciences
  ~> Check out the new Mammal Tree of Life 

 and the Mammal Diversity Database 

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Liam J. Revell
Dear Russell et al. (sorry if this is a re-send: my message yesterday 
seems to have been blocked by the listserv)


Using a for loop is a great idea! Highly underrated in R, IMO.

However, for future reference, the reason that your code didn't work 
with lapply is because the list you're 'applying' over (tree) also 
appears among the arguments!


If you want to use apply-family functions instead of a for loop (just, 
say, for fun) then you have two basic options: you can write a custom 
function; or you can use mapply.


Here's some (untested) code to do it.

## first, using a custom function & lapply:
foo<-function(tree) bind.tip(tree,
tip.label="Cercopithecus_albogularis",
position=0.59,edge.length=0.59,
where=getMRCA(tree,tip=c("Cercopithecus_mitis",
"Cercopithecus_mitis")))
newtree<-lapply(tree,foo)
class(newtree)<-"multiPhylo"

## now, using mapply:
newtree<-mapply(bind.tip,tree=tree,where=lapply(tree,getMRCA,
tip=c("Cercopithecus_mitis","Cercopithecus_mitis")),
MoreArgs=list(tip.label="Cercopithecus_albogularis",
position=0.59,edge.length=0.59),SIMPLIFY=FALSE)
class(newtree)<-"multiPhylo"

(Code is not guaranteed! I don't have the data file, so I didn't 
actually test it -- but something like this ought to work.)


Regardless, I recommend using ape::getMRCA (or phytools::fastMRCA) 
because otherwise you're computing an N x N matrix in each iteration of 
your function call just to get one node index.


Good luck! All the best, Liam

Liam J. Revell
University of Massachusetts Boston [Assoc. Prof.]
Universidad Católica de la Ssma Concepción [Adj. Res.]

Web & phytools:
http://faculty.umb.edu/liam.revell/, http://www.phytools.org, 
http://blog.phytools.org


Academic Director UMass Boston Chile Abroad:
https://www.umb.edu/academics/caps/international/biology_chile

U.S. COVID-19 explorer web application:
https://covid19-explorer.org/

On 6/2/2021 8:18 PM, Nathan Upham wrote:

EXTERNAL SENDER

Hi Russell:

Glad to hear you’re using the VertLife mammal trees — they are built on a 
taxonomy of 5,911 species of which only 4,098 are sampled for DNA, so there is 
already a ~30% chunk that is placed using taxonomic constraints and birth-death 
branch lengths as sampled during the estimation of 28 Bayesian patch clades.

Adding additional species described since the 2015 cutoff of that VertLife 
taxonomy makes sense (e.g., up to ~6,500 species on mammaldiversity.org).  
However, keep in mind that they will not have birth-death estimated branch 
lengths, but rather more likely be added as a polygamy to given clade and then 
randomly resolved.

Given the sample code you provided, the key thing you’ll want to do is run a 
*loop* rather than using lapply, so that you can specify a given tree each 
time, e.g.:

newtrees<-vector(“list”,length(trees))
for(j in 1:length(trees)){
newtrees[[j]] <- bind.tip(tree=trees[[j]], tip.label="Cercopithecus_albogularis”, 
position=0.59,edge.length = 0.59, 
where=mrca(tree[[j]])["Cercopithecus_mitis","Cercopithecus_mitis"])
}

I also wrote some code to prune mammal trees and add extinct Caribbean species, 
which uses a similar approach of making polytomies and randomly resolving them 
— here is the repo:
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fn8upham%2FCaribbeanExtinctions-WTWTW%2Ftree%2Fmaster%2FmamPhy_pruningCodedata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=KeZWNETOkPBekL3j5AIr2hygW49PbdSKImMV39QTXtE%3Dreserved=0
And here is the code file:
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fn8upham%2FCaribbeanExtinctions-WTWTW%2Fblob%2Fmaster%2FmamPhy_pruningCode%2FpruningCode_MamPhy-to-CaribbeanTaxa.Rdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=10u8o7U8dqC1pyK6GteNycoSACTOXAO2VaHNJmTeddk%3Dreserved=0

Hope that helps,
—nate




Nathan S. Upham, Ph.D. (he/him)
Assistant Research Professor & Associate Curator of Mammals
Arizona State University, School of Life Sciences
  ~> Check out the new Mammal Tree of Life 

 and the Mammal Diversity Database