Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-27 Thread Russell Engelman
Dear Everyone,

Sorry for the late reply. I had a crunch with several manuscripts that had
a looming due date, and am coming back to trying to fix the issue I am
having with the phylogeny I emailed about before. I've been trying
several things but I've still been unable to resolve the issue with the
error message

"Error in bind.tree(tree, tip, where = where, position = pp) :
  'position' is larger than the branch length"

despite the fact that the length of the terminal sister taxon to its
nearest node is much greater than the new branch length (i.e., there should
be plenty of length to attach a taxon). In many of the cases where I am
getting an error I am attaching a sister taxon to a terminal taxon, so it
doesn't seem like a lack of monophyly across the trees is the problem.

The only thing I could think of is trying to manually adjust the next node
up to give it enough room to attach the new terminal taxon, but I am unsure
what the easiest way to do that in R would be. I've never tried to mess
with branch lengths in a tree before.

With regards to what Dr. Miller said, I agree that birth-death trees seem
to suggest much younger divergence times than other studies, however I am
more struggling with the practical side of things. What I am trying to do
with this dataset is include enough taxa to perform a PGLS analysis, and
all I need is a phylogenetic tree to use as a backbone. The Upham et al.
2019 trees are about the only easily accessible trees I have been able to
find for Mammalia, I know there was an older one that Dr. Upham suggested
that was published in 2014 but I don't recall if that one was easily
available.

Sincerely,
Russell


On Fri, Jun 4, 2021 at 3:27 PM Eliot Miller  wrote:

> Russell,
>
> If you're trying to do this across a lot of species, you're going to bump
> into a lot of complexities like this. I got the impression you were trying
> to do this for many species. For example, there's no reason that the taxon
> you're adding is, even if we assume monophyly of the clade (genus in this
> example) in question, nested somewhere in the crown clade. It might be
> sister to the clade. So you should theoretically be considering that node
> as a possible place to add the taxon. Adding the taxon in a polytomy at the
> MRCA and then randomly resolving that should do that (I assume, though do
> those operations also consider moving stemwards?). You can also randomly
> choose a random valid node that doesn't render the clade polyphyletic,
> split on that node and use a birth-death model, or any number of other
> rules (midpoint, polytomy, uniform distribution) to decide how to divvy up
> branch lengths and keep the tree ultrametric. This is what's happening with
> addTaxa. Also, if you hit a situation where you're trying to add a taxon to
> a set of taxa that are not actually monophyletic, you'll get weird results
> with your current approach. If that's not your situation, no problems
> there, but you ought to check (or just use addTaxa which will make sure it
> doesn't make the clade any "more" polyphyletic, whatever that means). FWIW,
> I'm leery of using birth-death models to add lots of missing species
> because I think it biases the resulting tree towards increasing
> diversification in the present. At least it does with addTaxa. This might
> be a programming error on my part, but I actually suspect it's a systematic
> bias introduced by problems described in recent papers by Stilianos Louca
> and Matthew Pennell, e.g.
> https://www.sciencedirect.com/science/article/abs/pii/S0960982221006138?dgcid=coauthor
> Figuring out if incorporating new birth-death estimators (that don't cap
> extinction at 0) could resolve some of that bias was on my to-do list for
> addTaxa, and is why I've never gotten around to more formally publishing
> tests of it. Also, I'm fairly sure Jonathan Chang et al. wrote a relevant R
> package here which also automates most of this, and used it for a big fish
> phylogeny. That'd be worth a look if you're trying to do this across
> hundreds or thousands of species.
>
> Eliot
>
> On Fri, Jun 4, 2021 at 2:59 PM Russell Engelman 
> wrote:
>
>> Dear Everyone,
>>
>> So this is really strange. I tried what Dr. Revell suggested, and for the
>> two examples I gave it worked for *Cercopithecus albogularis*, but when
>> I applied it to *Cricetomys ansorgei *I got the message...
>>
>> "Error in bind.tree(tree, tip, where = where, position = pp) :
>>   'position' is larger than the branch length"
>>
>> I know that usually this means the new node position is usually further
>> back in time than the point where the branch diverges from its nearest
>> neighbor. However when I went to check how long the terminal branch leading
>> to *C. ansorgei*'s sister taxon that I am trying to bind it to is, *C.
>> emini*, it said that the branch length to this taxon is much, much
>> greater than the branch length of the new taxon I am trying to set (3.28
>> Ma), so it is not clear why it is 

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Russell Engelman
Dear Everyone,

So this is really strange. I tried what Dr. Revell suggested, and for the
two examples I gave it worked for *Cercopithecus albogularis*, but when I
applied it to *Cricetomys ansorgei *I got the message...

"Error in bind.tree(tree, tip, where = where, position = pp) :
  'position' is larger than the branch length"

I know that usually this means the new node position is usually further
back in time than the point where the branch diverges from its nearest
neighbor. However when I went to check how long the terminal branch leading
to *C. ansorgei*'s sister taxon that I am trying to bind it to is, *C.
emini*, it said that the branch length to this taxon is much, much greater
than the branch length of the new taxon I am trying to set (3.28 Ma), so it
is not clear why it is returning this error message. Here is some code that
shows this with the data set I attached.

```
nodes<-sapply(hbltree$tree_3084$tip.label,function(x,y)
which(y==x),y=hbltree$tree_3084$tip.label)
edge.lengths<-setNames(hbltree$tree_3084$edge.length[sapply(nodes,
   function(x,y)
which(y==x),y=hbltree$tree_3084$edge[,2])],names(nodes))
edge.lengths[c("Cricetomys_emini","Cercopithecus_mitis")]
```

When I do this with my own dataset I get an edge length of 8.588. I tried
this with another tree (tree_7180) to see if things were varying between
trees and I got a branch length of 7.760.

The only thing I can think of is this is a result of the divergence time
varying among the trees, but in that case I am not sure of how to bind a
tip to these trees given I have the divergence data from the literature but
the Upham et al. dataset may not be calibrated to account for it. I.e., the
Upham et al. supertree is estimating divergence time in *Cricetomys *spp.
based on the available data for the species included: *C. emini *and *C.
gambianus*, and when additional constraining data from *C. ansorgei *is
included it might produce a different estimate of branch lengths.

Additionally, is there a way to do this if there is no branch length known?
I have a number of taxa for which divergence times have not been published,
but for whom their phylogenetic position is known. I am trying to just
stick them in the tree as close as possible. I thought it should be
possible if I just delete the `position` and `edge.length` arguments, as I
thought it said in the `ape` documentation that doing so just create a
polytomy or something like that at the split, but when I tried that with my
data I got the message...

"Error in while (currnode != rt) { : missing value where TRUE/FALSE needed"

Which suggests I am missing a key parameter somewhere or misusing the
bind.tip function somehow.

Sincerely,
Russell

On Fri, Jun 4, 2021 at 11:39 AM Mike Collyer  wrote:

> Hello Everyone,
>
> Just to stack onto Liam’s “fun with apply family of functions”, I often
> use lapply, which works well by having a function that only applies to the
> objects in a list, but often — and for difficult to decipher reasons —
> breaks down if the function is not predefined, is complex, has a few
> arguments to vary, or might require several steps.  As a way to deal with
> this kind of issue, I often define a new list for lapply and avoid using
> the list I want to work as the target.  Here is an attempt to show how that
> can be done (but I think Liam’s suggestion is actually better)
>
> n <- length(tree)
>
> newTree <- lapply(1:n, function(j){
>   tree.j <- tree[[j]]
>   bind.tip(tree.j, bind.tip, tip.label="Cercopithecus_albogularis",
>   position=0.59, edge.length = 0.59,
>   where=mrca(tree.j)["Cercopithecus_mitis","Cercopithecus_mitis"])
> })
>
>
> Another way this could be done is to make multiple lists and use the Map
> function, which is basically a modification of the mapply example Liam
> used.  (This example also uses Liam’s suggestion to use fastMRCA.)
>
> tip.label <- lapply(1:n, function(.) "Cercopithecus_albogularis")
> MRCA <- lapply(1:n, function(j) fastMRCA(tree[[j]],
> "Cercopithecus_albogularis", "Cercopithecus_albogularis"))
>
> newTree <- Map(function(tr, tl, m) bind.tip(tr, tl,
>position=0.59, edge.length = 0.59,
>where = m),
>tree,
>tip.label,
>MRCA
>   )
>
> Finally, the do.call function can be helpful when repeating a function
> that has only one or few arguments among several changing over a list.
> Building on previous set up, one could do this
>
> bind.args <- list(tree = tree[[1]], tip.label =
> “Cercopithecus_albogularis”,
>   position=0.59, edge.length = 0.59, where = MRCA[[1]],
>   interactive = FALSE)
>
> newTree <- lapply(1:n, function(j){
>   bind.args$tree <- tree[[j]]
>   bind.args$where <- MRCA[[j]]
>   do.call(bind.tip, bind.args)
> })
>
>
> I also suggest these without verification with real data.
>
> Cheers!
> Mike
>
>
> > On Jun 2, 2021, at 9:11 PM, Liam J. Revell  wrote:
> >
> > Dear 

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Liam J. Revell



Dear Russell.

For some reason, my emails don't seem to be getting through to the whole 
list -- so my apologies if this ends up being redundant with another reply.


In short, if you supply a vector of tip labels that are identical (e.g., 
c("Cricetomys_emini","Cricetomys_emini")), ape::getMRCA returns the 
parent node of the corresponding taxon -- rather than the tip index. I 
didn't know that, but now I do.


You can switch to using phytools::fastMRCA. fastMRCA takes two tip label 
arguments instead of one -- so in your case the function call would be 
fastMRCA(tree,"Cricetomys_emini","Cricetomys_emini").


You can also just use the base R function which. In this case, you would 
do which(tree$tip.label=="Cricetomys_emini"). This should have the same 
effect.


You do seem to be using the arguments 'where' and 'position' correctly 
in phytools::bind.tip. 'where' should point to the node immediately 
*above* (i.e., tipward) of the place you want to attach the new tip; 
while 'position' should be the depth *below* that node.


Here's an update to your mapply call using fastMRCA instead of getMRCA.

newtree<-mapply(bind.tip,tree=tree,where=lapply(tree,fastMRCA,
sp1="Cercopithecus_mitis",sp2="Cercopithecus_mitis"),
MoreArgs=list(tip.label="Cercopithecus_albogularis",
position=0.59,edge.length=0.59),SIMPLIFY=FALSE)
class(newtree)<-"multiPhylo"

Please let me (& the list) know if this solution works as expected.

All the best, Liam

Liam J. Revell
University of Massachusetts Boston [Assoc. Prof.]
Universidad Católica de la Ssma Concepción [Adj. Res.]

Web & phytools:
http://faculty.umb.edu/liam.revell/, http://www.phytools.org, 
http://blog.phytools.org


Academic Director UMass Boston Chile Abroad:
https://www.umb.edu/academics/caps/international/biology_chile

U.S. COVID-19 explorer web application:
https://covid19-explorer.org/

On 6/4/2021 1:17 AM, Russell Engelman wrote:

EXTERNAL SENDER
Dear Everyone,

I think these are all really good suggestions. I tried this with my data 
(specifically the mapply with getMRCA) and it sort of worked. Sort of in 
that the mapply did apply the transformation to all 100 trees, but I 
noticed that after I applied it I got some funky results. Here is 
another example with /Cricetomys ansorgei /and some associated code...


hbltree <-read.nexus(file=" hbltree .nex")
class(hbltree)<-"multiPhylo"
newtrees<-mapply(bind.tip,tree=hbltree,where=lapply(hbltree,getMRCA,
   
  tip=c("Cricetomys_emini","Cricetomys_emini")),

                  MoreArgs=list(tip.label="Cricetomys_ansorgei",
                      position=3.282,edge.length=3.282),SIMPLIFY=FALSE)

However I noticed that when I printed the phylogeny the function for 
some reason does not place /C. ansorgei /as the sister taxon to /C. 
emini/, instead it places it as sister to /C. emini/ + /C. gambianus. 
/That part I am not sure why this is because getMRCA should find the 
terminal tip of /C. emini/, given that it is finding the MRCA of the 
same taxon.


It also does not place /C. ansorgei /at the tip of the tree. This might 
be a side effect of the function selecting the wrong node to insert the 
taxon on, but I think part of it as well is that I'm not familiar with 
how to handle the position and edge.length arguments in this case. I 
know from consulting the primary literature that /C. ansorgei /is 
recoded to have diverged from its nearest sister taxon, /C. emini/, 
about 3.282 Ma. So I know the 'edge.length' argument should be set to 
3.282. I've often struggled a lot with how to efficiently set the 
'position' argument afterwards.


Is it possible I am calling the wrong node or something to insert the 
taxon at?


Sincerely,
Russell

On Thu, Jun 3, 2021 at 10:47 AM Liam J. Revell > wrote:


Dear Russell et al. (sorry if this is a re-send: my message yesterday
seems to have been blocked by the listserv)

Using a for loop is a great idea! Highly underrated in R, IMO.

However, for future reference, the reason that your code didn't work
with lapply is because the list you're 'applying' over (tree) also
appears among the arguments!

If you want to use apply-family functions instead of a for loop (just,
say, for fun) then you have two basic options: you can write a custom
function; or you can use mapply.

Here's some (untested) code to do it.

## first, using a custom function & lapply:
foo<-function(tree) bind.tip(tree,
      tip.label="Cercopithecus_albogularis",
      position=0.59,edge.length=0.59,
      where=getMRCA(tree,tip=c("Cercopithecus_mitis",
      "Cercopithecus_mitis")))
newtree<-lapply(tree,foo)
class(newtree)<-"multiPhylo"

## now, using mapply:
newtree<-mapply(bind.tip,tree=tree,where=lapply(tree,getMRCA,
      tip=c("Cercopithecus_mitis","Cercopithecus_mitis")),
      MoreArgs=list(tip.label="Cercopithecus_albogularis",
   

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Mike Collyer


Hello Everyone,

Just to stack onto Liam’s “fun with apply family of functions”, I often use 
lapply, which works well by having a function that only applies to the objects 
in a list, but often — and for difficult to decipher reasons — breaks down if 
the function is not predefined, is complex, has a few arguments to vary, or 
might require several steps.  As a way to deal with this kind of issue, I often 
define a new list for lapply and avoid using the list I want to work as the 
target.  Here is an attempt to show how that can be done (but I think Liam’s 
suggestion is actually better)

n <- length(tree)

newTree <- lapply(1:n, function(j){
  tree.j <- tree[[j]]
  bind.tip(tree.j, bind.tip, tip.label="Cercopithecus_albogularis",
  position=0.59, edge.length = 0.59,  
  where=mrca(tree.j)["Cercopithecus_mitis","Cercopithecus_mitis"])
})


Another way this could be done is to make multiple lists and use the Map 
function, which is basically a modification of the mapply example Liam used.  
(This example also uses Liam’s suggestion to use fastMRCA.)

tip.label <- lapply(1:n, function(.) "Cercopithecus_albogularis")
MRCA <- lapply(1:n, function(j) fastMRCA(tree[[j]], 
"Cercopithecus_albogularis", "Cercopithecus_albogularis"))

newTree <- Map(function(tr, tl, m) bind.tip(tr, tl, 
   position=0.59, edge.length = 0.59,  
   where = m),
   tree,
   tip.label,
   MRCA
  )

Finally, the do.call function can be helpful when repeating a function that has 
only one or few arguments among several changing over a list.  Building on 
previous set up, one could do this

bind.args <- list(tree = tree[[1]], tip.label = “Cercopithecus_albogularis”,
  position=0.59, edge.length = 0.59, where = MRCA[[1]],
  interactive = FALSE)

newTree <- lapply(1:n, function(j){
  bind.args$tree <- tree[[j]]
  bind.args$where <- MRCA[[j]]
  do.call(bind.tip, bind.args)
})


I also suggest these without verification with real data.

Cheers!
Mike


> On Jun 2, 2021, at 9:11 PM, Liam J. Revell  wrote:
> 
> Dear Russell et al.
> 
> Using a for loop is a great idea! Highly underrated in R, IMO. ;)
> 
> However, for future reference, the reason that your code didn't work with 
> lapply is because the list you're 'applying' over (tree) also appears among 
> the arguments!
> 
> If you want to use apply-family functions instead of a for loop (just, say, 
> for fun) then you have two basic options: you can write a custom function; or 
> you can use mapply.
> 
> Here's some (untested) code to do it.
> 
> ## first, using a custom function & lapply:
> foo<-function(tree) bind.tip(tree,
>   tip.label="Cercopithecus_albogularis",
>   position=0.59,edge.length=0.59,
>   where=getMRCA(tree,tip=c("Cercopithecus_mitis",
>   "Cercopithecus_mitis")))
> newtree<-lapply(tree,foo)
> class(newtree)<-"multiPhylo"
> 
> ## now, using mapply:
> newtree<-mapply(bind.tip,tree=tree,where=lapply(tree,getMRCA,
>   tip=c("Cercopithecus_mitis","Cercopithecus_mitis")),
>   MoreArgs=list(tip.label="Cercopithecus_albogularis",
>   position=0.59,edge.length=0.59),SIMPLIFY=FALSE)
> class(newtree)<-"multiPhylo"
> 
> (Code is not guaranteed! I don't have the data file, so I didn't actually 
> test it -- but something like this ought to work.)
> 
> Regardless, I recommend using ape::getMRCA (or phytools::fastMRCA) because 
> otherwise you're computing an N x N matrix in each iteration of your function 
> call just to get one node index.
> 
> Good luck! All the best, Liam
> 
> Liam J. Revell
> University of Massachusetts Boston [Assoc. Prof.]
> Universidad Católica de la Ssma Concepción [Adj. Res.]
> 
> Web & phytools:
> http://faculty.umb.edu/liam.revell/, http://www.phytools.org, 
> http://blog.phytools.org
> 
> Academic Director UMass Boston Chile Abroad:
> https://www.umb.edu/academics/caps/international/biology_chile
> 
> U.S. COVID-19 explorer web application:
> https://covid19-explorer.org/
> 
> On 6/2/2021 8:18 PM, Nathan Upham wrote:
>> EXTERNAL SENDER
>> Hi Russell:
>> Glad to hear you’re using the VertLife mammal trees — they are built on a 
>> taxonomy of 5,911 species of which only 4,098 are sampled for DNA, so there 
>> is already a ~30% chunk that is placed using taxonomic constraints and 
>> birth-death branch lengths as sampled during the estimation of 28 Bayesian 
>> patch clades.
>> Adding additional species described since the 2015 cutoff of that VertLife 
>> taxonomy makes sense (e.g., up to ~6,500 species on mammaldiversity.org).  
>> However, keep in mind that they will not have birth-death estimated branch 
>> lengths, but rather more likely be added as a polygamy to given clade and 
>> then randomly resolved.
>> Given the sample code you provided, the key thing you’ll want to do is run a 
>> *loop* rather than using lapply, so that you can specify a given tree each 
>> time, e.g.:
>> newtrees<-vector(“list”,length(trees))
>> for(j in 

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Russell Engelman
Dear Everyone,

So this is really strange. I tried what Dr. Revell suggested, and for the
two examples I gave it worked for *Cercopithecus albogularis*, but when I
applied it to *Cricetomys ansorgei *I got the message...

"Error in bind.tree(tree, tip, where = where, position = pp) :
  'position' is larger than the branch length"

I know that usually this means the new node position is usually further
back in time than the point where the branch diverges from its nearest
neighbor. However when I went to check how long the terminal branch leading
to *C. ansorgei*'s sister taxon that I am trying to bind it to is, *C.
emini*, it said that the branch length to this taxon is much, much greater
than the branch length of the new taxon I am trying to set (3.28 Ma), so it
is not clear why it is returning this error message. Here is some code that
shows this with the data set I attached.

```
nodes<-sapply(hbltree$tree_3084$tip.label,function(x,y)
which(y==x),y=hbltree$tree_3084$tip.label)
edge.lengths<-setNames(hbltree$tree_3084$edge.length[sapply(nodes,
   function(x,y)
which(y==x),y=hbltree$tree_3084$edge[,2])],names(nodes))
edge.lengths[c("Cricetomys_emini","Cercopithecus_mitis")]
```

When I do this with my own dataset I get an edge length of 8.588. I tried
this with another tree (tree_7180) to see if things were varying between
trees and I got a branch length of 7.760.

The only thing I can think of is this is a result of the divergence time
varying among the trees, but in that case I am not sure of how to bind a
tip to these trees given I have the divergence data from the literature but
the Upham et al. dataset may not be calibrated to account for it. I.e., the
Upham et al. supertree is estimating divergence time in *Cricetomys *spp.
based on the available data for the species included: *C. emini *and *C.
gambianus*, and when additional constraining data from *C. ansorgei *is
included it might produce a different estimate of branch lengths.

Additionally, is there a way to do this if there is no branch length known?
I have a number of taxa for which divergence times have not been published,
but for whom their phylogenetic position is known. I am trying to just
stick them in the tree as close as possible. I thought it should be
possible if I just delete the `position` and `edge.length` arguments, as I
thought it said in the `ape` documentation that doing so just create a
polytomy or something like that at the split, but when I tried that with my
data I got the message...

"Error in while (currnode != rt) { : missing value where TRUE/FALSE needed"

Which suggests I am missing a key parameter somewhere or misusing the
bind.tip function somehow.

Sincerely,
Russell

On Fri, Jun 4, 2021 at 11:39 AM Mike Collyer  wrote:

> Hello Everyone,
>
> Just to stack onto Liam’s “fun with apply family of functions”, I often
> use lapply, which works well by having a function that only applies to the
> objects in a list, but often — and for difficult to decipher reasons —
> breaks down if the function is not predefined, is complex, has a few
> arguments to vary, or might require several steps.  As a way to deal with
> this kind of issue, I often define a new list for lapply and avoid using
> the list I want to work as the target.  Here is an attempt to show how that
> can be done (but I think Liam’s suggestion is actually better)
>
> n <- length(tree)
>
> newTree <- lapply(1:n, function(j){
>   tree.j <- tree[[j]]
>   bind.tip(tree.j, bind.tip, tip.label="Cercopithecus_albogularis",
>   position=0.59, edge.length = 0.59,
>   where=mrca(tree.j)["Cercopithecus_mitis","Cercopithecus_mitis"])
> })
>
>
> Another way this could be done is to make multiple lists and use the Map
> function, which is basically a modification of the mapply example Liam
> used.  (This example also uses Liam’s suggestion to use fastMRCA.)
>
> tip.label <- lapply(1:n, function(.) "Cercopithecus_albogularis")
> MRCA <- lapply(1:n, function(j) fastMRCA(tree[[j]],
> "Cercopithecus_albogularis", "Cercopithecus_albogularis"))
>
> newTree <- Map(function(tr, tl, m) bind.tip(tr, tl,
>position=0.59, edge.length = 0.59,
>where = m),
>tree,
>tip.label,
>MRCA
>   )
>
> Finally, the do.call function can be helpful when repeating a function
> that has only one or few arguments among several changing over a list.
> Building on previous set up, one could do this
>
> bind.args <- list(tree = tree[[1]], tip.label =
> “Cercopithecus_albogularis”,
>   position=0.59, edge.length = 0.59, where = MRCA[[1]],
>   interactive = FALSE)
>
> newTree <- lapply(1:n, function(j){
>   bind.args$tree <- tree[[j]]
>   bind.args$where <- MRCA[[j]]
>   do.call(bind.tip, bind.args)
> })
>
>
> I also suggest these without verification with real data.
>
> Cheers!
> Mike
>
>
> > On Jun 2, 2021, at 9:11 PM, Liam J. Revell  wrote:
> >
> > Dear 

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Eliot Miller
Russell,

If you're trying to do this across a lot of species, you're going to bump
into a lot of complexities like this. I got the impression you were trying
to do this for many species. For example, there's no reason that the taxon
you're adding is, even if we assume monophyly of the clade (genus in this
example) in question, nested somewhere in the crown clade. It might be
sister to the clade. So you should theoretically be considering that node
as a possible place to add the taxon. Adding the taxon in a polytomy at the
MRCA and then randomly resolving that should do that (I assume, though do
those operations also consider moving stemwards?). You can also randomly
choose a random valid node that doesn't render the clade polyphyletic,
split on that node and use a birth-death model, or any number of other
rules (midpoint, polytomy, uniform distribution) to decide how to divvy up
branch lengths and keep the tree ultrametric. This is what's happening with
addTaxa. Also, if you hit a situation where you're trying to add a taxon to
a set of taxa that are not actually monophyletic, you'll get weird results
with your current approach. If that's not your situation, no problems
there, but you ought to check (or just use addTaxa which will make sure it
doesn't make the clade any "more" polyphyletic, whatever that means). FWIW,
I'm leery of using birth-death models to add lots of missing species
because I think it biases the resulting tree towards increasing
diversification in the present. At least it does with addTaxa. This might
be a programming error on my part, but I actually suspect it's a systematic
bias introduced by problems described in recent papers by Stilianos Louca
and Matthew Pennell, e.g.
https://www.sciencedirect.com/science/article/abs/pii/S0960982221006138?dgcid=coauthor
Figuring out if incorporating new birth-death estimators (that don't cap
extinction at 0) could resolve some of that bias was on my to-do list for
addTaxa, and is why I've never gotten around to more formally publishing
tests of it. Also, I'm fairly sure Jonathan Chang et al. wrote a relevant R
package here which also automates most of this, and used it for a big fish
phylogeny. That'd be worth a look if you're trying to do this across
hundreds or thousands of species.

Eliot

On Fri, Jun 4, 2021 at 2:59 PM Russell Engelman 
wrote:

> Dear Everyone,
>
> So this is really strange. I tried what Dr. Revell suggested, and for the
> two examples I gave it worked for *Cercopithecus albogularis*, but when I
> applied it to *Cricetomys ansorgei *I got the message...
>
> "Error in bind.tree(tree, tip, where = where, position = pp) :
>   'position' is larger than the branch length"
>
> I know that usually this means the new node position is usually further
> back in time than the point where the branch diverges from its nearest
> neighbor. However when I went to check how long the terminal branch leading
> to *C. ansorgei*'s sister taxon that I am trying to bind it to is, *C.
> emini*, it said that the branch length to this taxon is much, much
> greater than the branch length of the new taxon I am trying to set (3.28
> Ma), so it is not clear why it is returning this error message. Here is
> some code that shows this with the data set I attached.
>
> ```
> nodes<-sapply(hbltree$tree_3084$tip.label,function(x,y)
> which(y==x),y=hbltree$tree_3084$tip.label)
> edge.lengths<-setNames(hbltree$tree_3084$edge.length[sapply(nodes,
>function(x,y)
> which(y==x),y=hbltree$tree_3084$edge[,2])],names(nodes))
> edge.lengths[c("Cricetomys_emini","Cercopithecus_mitis")]
> ```
>
> When I do this with my own dataset I get an edge length of 8.588. I tried
> this with another tree (tree_7180) to see if things were varying between
> trees and I got a branch length of 7.760.
>
> The only thing I can think of is this is a result of the divergence time
> varying among the trees, but in that case I am not sure of how to bind a
> tip to these trees given I have the divergence data from the literature but
> the Upham et al. dataset may not be calibrated to account for it. I.e., the
> Upham et al. supertree is estimating divergence time in *Cricetomys *spp.
> based on the available data for the species included: *C. emini *and *C.
> gambianus*, and when additional constraining data from *C. ansorgei *is
> included it might produce a different estimate of branch lengths.
>
> Additionally, is there a way to do this if there is no branch length
> known? I have a number of taxa for which divergence times have not been
> published, but for whom their phylogenetic position is known. I am trying
> to just stick them in the tree as close as possible. I thought it should be
> possible if I just delete the `position` and `edge.length` arguments, as I
> thought it said in the `ape` documentation that doing so just create a
> polytomy or something like that at the split, but when I tried that with my
> data I got the message...
>
> 

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Liam J. Revell



Dear Russell et al. (sorry if this is a re-send: my message yesterday 
seems to have been blocked by the listserv)


Using a for loop is a great idea! Highly underrated in R, IMO.

However, for future reference, the reason that your code didn't work 
with lapply is because the list you're 'applying' over (tree) also 
appears among the arguments!


If you want to use apply-family functions instead of a for loop (just, 
say, for fun) then you have two basic options: you can write a custom 
function; or you can use mapply.


Here's some (untested) code to do it.

## first, using a custom function & lapply:
foo<-function(tree) bind.tip(tree,
tip.label="Cercopithecus_albogularis",
position=0.59,edge.length=0.59,
where=getMRCA(tree,tip=c("Cercopithecus_mitis",
"Cercopithecus_mitis")))
newtree<-lapply(tree,foo)
class(newtree)<-"multiPhylo"

## now, using mapply:
newtree<-mapply(bind.tip,tree=tree,where=lapply(tree,getMRCA,
tip=c("Cercopithecus_mitis","Cercopithecus_mitis")),
MoreArgs=list(tip.label="Cercopithecus_albogularis",
position=0.59,edge.length=0.59),SIMPLIFY=FALSE)
class(newtree)<-"multiPhylo"

(Code is not guaranteed! I don't have the data file, so I didn't 
actually test it -- but something like this ought to work.)


Regardless, I recommend using ape::getMRCA (or phytools::fastMRCA) 
because otherwise you're computing an N x N matrix in each iteration of 
your function call just to get one node index.


Good luck! All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
Universidad Católica de la Ssma Concepción

Web & phytools:
http://faculty.umb.edu/liam.revell/, http://www.phytools.org, 
http://blog.phytools.org


Academic Director UMass Boston Chile Abroad:
https://www.umb.edu/academics/caps/international/biology_chile

U.S. COVID-19 explorer web application:
https://covid19-explorer.org/

On 6/2/2021 8:18 PM, Nathan Upham wrote:

Hi Russell:

Glad to hear you’re using the VertLife mammal trees — they are built on a 
taxonomy of 5,911 species of which only 4,098 are sampled for DNA, so there is 
already a ~30% chunk that is placed using taxonomic constraints and birth-death 
branch lengths as sampled during the estimation of 28 Bayesian patch clades.

Adding additional species described since the 2015 cutoff of that VertLife 
taxonomy makes sense (e.g., up to ~6,500 species on mammaldiversity.org).  
However, keep in mind that they will not have birth-death estimated branch 
lengths, but rather more likely be added as a polygamy to given clade and then 
randomly resolved.

Given the sample code you provided, the key thing you’ll want to do is run a 
*loop* rather than using lapply, so that you can specify a given tree each 
time, e.g.:

newtrees<-vector(“list”,length(trees))
for(j in 1:length(trees)){
newtrees[[j]] <- bind.tip(tree=trees[[j]], tip.label="Cercopithecus_albogularis”, 
position=0.59,edge.length = 0.59, 
where=mrca(tree[[j]])["Cercopithecus_mitis","Cercopithecus_mitis"])
}

I also wrote some code to prune mammal trees and add extinct Caribbean species, 
which uses a similar approach of making polytomies and randomly resolving them 
— here is the repo:
https://github.com/n8upham/CaribbeanExtinctions-WTWTW/tree/master/mamPhy_pruningCode
And here is the code file:
https://github.com/n8upham/CaribbeanExtinctions-WTWTW/blob/master/mamPhy_pruningCode/pruningCode_MamPhy-to-CaribbeanTaxa.R

Hope that helps,
—nate




Nathan S. Upham, Ph.D. (he/him)
Assistant Research Professor & Associate Curator of Mammals
Arizona State University, School of Life Sciences
  ~> Check out the new Mammal Tree of Life  
and the Mammal Diversity Database 

Research Associate, Yale University (Ecology and Evolutionary Biology)
Research Associate, Field Museum of Natural History (Negaunee Integrative 
Research Center)
Chair, Biodiversity Committee, American Society of Mammalogists
Taxonomy Advisor, IUCN/SSC Small Mammal Specialist Group

personal web: n8u.org | Google Scholar 

 | ASU profile 
e: nathan.up...@asu.edu | Skype: nate_upham | Twitter: @n8_upham 






On Jun 2, 2021, at 4:19 PM, Eliot Miller  wrote:

Hi Russell,

A package I wrote a while back should be able to do that fairly easily.
https://urldefense.com/v3/__https://github.com/eliotmiller/addTaxa__;!!IKRxdwAv5BmarQ!OZj7-dFRbxvUothKjSj6hr9B0eXscAO6LVWi1-a0w3J_PxlDqvsFDNb0lQrzxl2aIw$
  The only paper it's described in
remains 

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Liam J. Revell

Dear Russell et al.

Using a for loop is a great idea! Highly underrated in R, IMO. ;)

However, for future reference, the reason that your code didn't work 
with lapply is because the list you're 'applying' over (tree) also 
appears among the arguments!


If you want to use apply-family functions instead of a for loop (just, 
say, for fun) then you have two basic options: you can write a custom 
function; or you can use mapply.


Here's some (untested) code to do it.

## first, using a custom function & lapply:
foo<-function(tree) bind.tip(tree,
tip.label="Cercopithecus_albogularis",
position=0.59,edge.length=0.59,
where=getMRCA(tree,tip=c("Cercopithecus_mitis",
"Cercopithecus_mitis")))
newtree<-lapply(tree,foo)
class(newtree)<-"multiPhylo"

## now, using mapply:
newtree<-mapply(bind.tip,tree=tree,where=lapply(tree,getMRCA,
tip=c("Cercopithecus_mitis","Cercopithecus_mitis")),
MoreArgs=list(tip.label="Cercopithecus_albogularis",
position=0.59,edge.length=0.59),SIMPLIFY=FALSE)
class(newtree)<-"multiPhylo"

(Code is not guaranteed! I don't have the data file, so I didn't 
actually test it -- but something like this ought to work.)


Regardless, I recommend using ape::getMRCA (or phytools::fastMRCA) 
because otherwise you're computing an N x N matrix in each iteration of 
your function call just to get one node index.


Good luck! All the best, Liam

Liam J. Revell
University of Massachusetts Boston [Assoc. Prof.]
Universidad Católica de la Ssma Concepción [Adj. Res.]

Web & phytools:
http://faculty.umb.edu/liam.revell/, http://www.phytools.org, 
http://blog.phytools.org


Academic Director UMass Boston Chile Abroad:
https://www.umb.edu/academics/caps/international/biology_chile

U.S. COVID-19 explorer web application:
https://covid19-explorer.org/

On 6/2/2021 8:18 PM, Nathan Upham wrote:

EXTERNAL SENDER

Hi Russell:

Glad to hear you’re using the VertLife mammal trees — they are built on a 
taxonomy of 5,911 species of which only 4,098 are sampled for DNA, so there is 
already a ~30% chunk that is placed using taxonomic constraints and birth-death 
branch lengths as sampled during the estimation of 28 Bayesian patch clades.

Adding additional species described since the 2015 cutoff of that VertLife 
taxonomy makes sense (e.g., up to ~6,500 species on mammaldiversity.org).  
However, keep in mind that they will not have birth-death estimated branch 
lengths, but rather more likely be added as a polygamy to given clade and then 
randomly resolved.

Given the sample code you provided, the key thing you’ll want to do is run a 
*loop* rather than using lapply, so that you can specify a given tree each 
time, e.g.:

newtrees<-vector(“list”,length(trees))
for(j in 1:length(trees)){
newtrees[[j]] <- bind.tip(tree=trees[[j]], tip.label="Cercopithecus_albogularis”, 
position=0.59,edge.length = 0.59, 
where=mrca(tree[[j]])["Cercopithecus_mitis","Cercopithecus_mitis"])
}

I also wrote some code to prune mammal trees and add extinct Caribbean species, 
which uses a similar approach of making polytomies and randomly resolving them 
— here is the repo:
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fn8upham%2FCaribbeanExtinctions-WTWTW%2Ftree%2Fmaster%2FmamPhy_pruningCodedata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=KeZWNETOkPBekL3j5AIr2hygW49PbdSKImMV39QTXtE%3Dreserved=0
And here is the code file:
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fn8upham%2FCaribbeanExtinctions-WTWTW%2Fblob%2Fmaster%2FmamPhy_pruningCode%2FpruningCode_MamPhy-to-CaribbeanTaxa.Rdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=10u8o7U8dqC1pyK6GteNycoSACTOXAO2VaHNJmTeddk%3Dreserved=0

Hope that helps,
—nate




Nathan S. Upham, Ph.D. (he/him)
Assistant Research Professor & Associate Curator of Mammals
Arizona State University, School of Life Sciences
  ~> Check out the new Mammal Tree of Life 

 and the Mammal Diversity Database 

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-04 Thread Liam J. Revell
Dear Russell et al. (sorry if this is a re-send: my message yesterday 
seems to have been blocked by the listserv)


Using a for loop is a great idea! Highly underrated in R, IMO.

However, for future reference, the reason that your code didn't work 
with lapply is because the list you're 'applying' over (tree) also 
appears among the arguments!


If you want to use apply-family functions instead of a for loop (just, 
say, for fun) then you have two basic options: you can write a custom 
function; or you can use mapply.


Here's some (untested) code to do it.

## first, using a custom function & lapply:
foo<-function(tree) bind.tip(tree,
tip.label="Cercopithecus_albogularis",
position=0.59,edge.length=0.59,
where=getMRCA(tree,tip=c("Cercopithecus_mitis",
"Cercopithecus_mitis")))
newtree<-lapply(tree,foo)
class(newtree)<-"multiPhylo"

## now, using mapply:
newtree<-mapply(bind.tip,tree=tree,where=lapply(tree,getMRCA,
tip=c("Cercopithecus_mitis","Cercopithecus_mitis")),
MoreArgs=list(tip.label="Cercopithecus_albogularis",
position=0.59,edge.length=0.59),SIMPLIFY=FALSE)
class(newtree)<-"multiPhylo"

(Code is not guaranteed! I don't have the data file, so I didn't 
actually test it -- but something like this ought to work.)


Regardless, I recommend using ape::getMRCA (or phytools::fastMRCA) 
because otherwise you're computing an N x N matrix in each iteration of 
your function call just to get one node index.


Good luck! All the best, Liam

Liam J. Revell
University of Massachusetts Boston [Assoc. Prof.]
Universidad Católica de la Ssma Concepción [Adj. Res.]

Web & phytools:
http://faculty.umb.edu/liam.revell/, http://www.phytools.org, 
http://blog.phytools.org


Academic Director UMass Boston Chile Abroad:
https://www.umb.edu/academics/caps/international/biology_chile

U.S. COVID-19 explorer web application:
https://covid19-explorer.org/

On 6/2/2021 8:18 PM, Nathan Upham wrote:

EXTERNAL SENDER

Hi Russell:

Glad to hear you’re using the VertLife mammal trees — they are built on a 
taxonomy of 5,911 species of which only 4,098 are sampled for DNA, so there is 
already a ~30% chunk that is placed using taxonomic constraints and birth-death 
branch lengths as sampled during the estimation of 28 Bayesian patch clades.

Adding additional species described since the 2015 cutoff of that VertLife 
taxonomy makes sense (e.g., up to ~6,500 species on mammaldiversity.org).  
However, keep in mind that they will not have birth-death estimated branch 
lengths, but rather more likely be added as a polygamy to given clade and then 
randomly resolved.

Given the sample code you provided, the key thing you’ll want to do is run a 
*loop* rather than using lapply, so that you can specify a given tree each 
time, e.g.:

newtrees<-vector(“list”,length(trees))
for(j in 1:length(trees)){
newtrees[[j]] <- bind.tip(tree=trees[[j]], tip.label="Cercopithecus_albogularis”, 
position=0.59,edge.length = 0.59, 
where=mrca(tree[[j]])["Cercopithecus_mitis","Cercopithecus_mitis"])
}

I also wrote some code to prune mammal trees and add extinct Caribbean species, 
which uses a similar approach of making polytomies and randomly resolving them 
— here is the repo:
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fn8upham%2FCaribbeanExtinctions-WTWTW%2Ftree%2Fmaster%2FmamPhy_pruningCodedata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=KeZWNETOkPBekL3j5AIr2hygW49PbdSKImMV39QTXtE%3Dreserved=0
And here is the code file:
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fn8upham%2FCaribbeanExtinctions-WTWTW%2Fblob%2Fmaster%2FmamPhy_pruningCode%2FpruningCode_MamPhy-to-CaribbeanTaxa.Rdata=04%7C01%7Cliam.revell%40umb.edu%7C1174c283b881427e2df508d926253e2a%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637582763674838188%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000sdata=10u8o7U8dqC1pyK6GteNycoSACTOXAO2VaHNJmTeddk%3Dreserved=0

Hope that helps,
—nate




Nathan S. Upham, Ph.D. (he/him)
Assistant Research Professor & Associate Curator of Mammals
Arizona State University, School of Life Sciences
  ~> Check out the new Mammal Tree of Life 

 and the Mammal Diversity Database 

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-03 Thread Klaus Schliep
Dear Russell,
there is also a function getMRCA() in ape, which is more appropriate to use
for your problem, as it only computes the MRCA for a set of tips. This
should be much more efficient than using mrca().
Kind regards,
Klaus



On Thu, Jun 3, 2021 at 1:05 AM Russell Engelman 
wrote:

> Dear R-sig-phylo,
>
> I have been working with a mammalian phylogeny I recently downloaded from
> VertLife (http://vertlife.org/phylosubsets/). Unfortunately, the phylogeny
> is missing a large number of species, so I am trying to manually add these
> taxa to the phylogeny. I have a series of 100 trees that I am using to do
> things such as test for phylogenetic signal. I know how to use bind.tip to
> add new taxa to a single tree, but I am having more trouble with a
> multiPhylo object. I am primarily adding these taxa by placing them as
> sister to their nearest included relative (since most of them are elevated
> former subspecies), but the issue here is that in the 100 trees in the
> multiPhylo object the node representing the taxon to bind these taxa to is
> not the same across all trees due to shifting topologies.
>
> This is an example of the code I have been using, in which "tree" is the
> tree object. This works for a single 'phylo' tree but not 'multiphylo'.
>
> ```
> newtree<-lapply(tree,bind.tip,tip.label="Cercopithecus_albogularis",
> position=0.59,edge.length = 0.59,
>
> where=mrca(tree)["Cercopithecus_mitis","Cercopithecus_mitis"])
> ```
>
> Now, this code will not work, but I know exactly why: 'tree' is a
> multiPhylo object and so the 'where' argument cannot find the node for the
> terminal taxon. However, the issue is how can I tell R to repeat this
> 'where' argument for each of the 100 trees, since the node in question is
> not identical across these trees? Is there an easier way to do this than
> using the 'mrca' call for each terminal taxon? I've noticed adding a 'mrca'
> argument also increases computation time and if I am reinventing the wheel
> it would be nice to know if I am overthinking things.
>
> Sincerely,
> Russell
>
> [[alternative HTML version deleted]]
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>


-- 
Klaus Schliep

Senior Scientist
Institute of Computational Biotechnology
TU Graz
https://icbt.tugraz.at
https://www.phangorn.org/ 

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-02 Thread Nathan Upham
Hi Russell:

Glad to hear you’re using the VertLife mammal trees — they are built on a 
taxonomy of 5,911 species of which only 4,098 are sampled for DNA, so there is 
already a ~30% chunk that is placed using taxonomic constraints and birth-death 
branch lengths as sampled during the estimation of 28 Bayesian patch clades.  

Adding additional species described since the 2015 cutoff of that VertLife 
taxonomy makes sense (e.g., up to ~6,500 species on mammaldiversity.org).  
However, keep in mind that they will not have birth-death estimated branch 
lengths, but rather more likely be added as a polygamy to given clade and then 
randomly resolved.

Given the sample code you provided, the key thing you’ll want to do is run a 
*loop* rather than using lapply, so that you can specify a given tree each 
time, e.g.:

newtrees<-vector(“list”,length(trees))
for(j in 1:length(trees)){
newtrees[[j]] <- bind.tip(tree=trees[[j]], 
tip.label="Cercopithecus_albogularis”, position=0.59,edge.length = 0.59, 
where=mrca(tree[[j]])["Cercopithecus_mitis","Cercopithecus_mitis"])
}

I also wrote some code to prune mammal trees and add extinct Caribbean species, 
which uses a similar approach of making polytomies and randomly resolving them 
— here is the repo:
https://github.com/n8upham/CaribbeanExtinctions-WTWTW/tree/master/mamPhy_pruningCode
And here is the code file:
https://github.com/n8upham/CaribbeanExtinctions-WTWTW/blob/master/mamPhy_pruningCode/pruningCode_MamPhy-to-CaribbeanTaxa.R

Hope that helps,
—nate




Nathan S. Upham, Ph.D. (he/him)
Assistant Research Professor & Associate Curator of Mammals
Arizona State University, School of Life Sciences
 ~> Check out the new Mammal Tree of Life 
 and the Mammal Diversity Database 


Research Associate, Yale University (Ecology and Evolutionary Biology)
Research Associate, Field Museum of Natural History (Negaunee Integrative 
Research Center)
Chair, Biodiversity Committee, American Society of Mammalogists
Taxonomy Advisor, IUCN/SSC Small Mammal Specialist Group

personal web: n8u.org | Google Scholar 

 | ASU profile 
e: nathan.up...@asu.edu | Skype: nate_upham | Twitter: @n8_upham 
 




> On Jun 2, 2021, at 4:19 PM, Eliot Miller  wrote:
> 
> Hi Russell,
> 
> A package I wrote a while back should be able to do that fairly easily.
> https://urldefense.com/v3/__https://github.com/eliotmiller/addTaxa__;!!IKRxdwAv5BmarQ!OZj7-dFRbxvUothKjSj6hr9B0eXscAO6LVWi1-a0w3J_PxlDqvsFDNb0lQrzxl2aIw$
>   The only paper it's described in
> remains 
> https://urldefense.com/v3/__https://bsapubs.onlinelibrary.wiley.com/doi/full/10.3732/ajb.1500195__;!!IKRxdwAv5BmarQ!OZj7-dFRbxvUothKjSj6hr9B0eXscAO6LVWi1-a0w3J_PxlDqvsFDNb0lQp7PnRRHg$
>  
> It's a wrapper for bind.tip, with some additional stuff. You basically
> would give it a taxonomic file where you identify the clades you're
> interested in (e.g. both of those Cercopithecus species could be named some
> unique clade name and off you go, it'd add the missing one to the other),
> then lapply that whole addTaxa command over the list of trees in
> multiPhylo. At some point I made laser a dependency, and it's possible I
> left it in that state. If that's the case, you can still get laser from old
> CRAN mirrors I believe. Let me know if you want more help.
> 
> Best,
> Eliot
> 
> On Wed, Jun 2, 2021 at 7:05 PM Russell Engelman 
> wrote:
> 
>> Dear R-sig-phylo,
>> 
>> I have been working with a mammalian phylogeny I recently downloaded from
>> VertLife 
>> (https://urldefense.com/v3/__http://vertlife.org/phylosubsets/__;!!IKRxdwAv5BmarQ!OZj7-dFRbxvUothKjSj6hr9B0eXscAO6LVWi1-a0w3J_PxlDqvsFDNb0lQoEdEnMAg$
>>  ). Unfortunately, the phylogeny
>> is missing a large number of species, so I am trying to manually add these
>> taxa to the phylogeny. I have a series of 100 trees that I am using to do
>> things such as test for phylogenetic signal. I know how to use bind.tip to
>> add new taxa to a single tree, but I am having more trouble with a
>> multiPhylo object. I am primarily adding these taxa by placing them as
>> sister to their nearest included relative (since most of them are elevated
>> former subspecies), but the issue here is that in the 100 trees in the
>> multiPhylo object the node representing the taxon to bind these taxa to is
>> not the same across all trees due to shifting topologies.
>> 
>> This is an example of the code I have been using, in which "tree" is the
>> tree object. This works for a single 'phylo' tree but not 'multiphylo'.
>> 
>> ```
>> 

Re: [R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-02 Thread Eliot Miller
Hi Russell,

A package I wrote a while back should be able to do that fairly easily.
https://github.com/eliotmiller/addTaxa The only paper it's described in
remains https://bsapubs.onlinelibrary.wiley.com/doi/full/10.3732/ajb.1500195
It's a wrapper for bind.tip, with some additional stuff. You basically
would give it a taxonomic file where you identify the clades you're
interested in (e.g. both of those Cercopithecus species could be named some
unique clade name and off you go, it'd add the missing one to the other),
then lapply that whole addTaxa command over the list of trees in
multiPhylo. At some point I made laser a dependency, and it's possible I
left it in that state. If that's the case, you can still get laser from old
CRAN mirrors I believe. Let me know if you want more help.

Best,
Eliot

On Wed, Jun 2, 2021 at 7:05 PM Russell Engelman 
wrote:

> Dear R-sig-phylo,
>
> I have been working with a mammalian phylogeny I recently downloaded from
> VertLife (http://vertlife.org/phylosubsets/). Unfortunately, the phylogeny
> is missing a large number of species, so I am trying to manually add these
> taxa to the phylogeny. I have a series of 100 trees that I am using to do
> things such as test for phylogenetic signal. I know how to use bind.tip to
> add new taxa to a single tree, but I am having more trouble with a
> multiPhylo object. I am primarily adding these taxa by placing them as
> sister to their nearest included relative (since most of them are elevated
> former subspecies), but the issue here is that in the 100 trees in the
> multiPhylo object the node representing the taxon to bind these taxa to is
> not the same across all trees due to shifting topologies.
>
> This is an example of the code I have been using, in which "tree" is the
> tree object. This works for a single 'phylo' tree but not 'multiphylo'.
>
> ```
> newtree<-lapply(tree,bind.tip,tip.label="Cercopithecus_albogularis",
> position=0.59,edge.length = 0.59,
>
> where=mrca(tree)["Cercopithecus_mitis","Cercopithecus_mitis"])
> ```
>
> Now, this code will not work, but I know exactly why: 'tree' is a
> multiPhylo object and so the 'where' argument cannot find the node for the
> terminal taxon. However, the issue is how can I tell R to repeat this
> 'where' argument for each of the 100 trees, since the node in question is
> not identical across these trees? Is there an easier way to do this than
> using the 'mrca' call for each terminal taxon? I've noticed adding a 'mrca'
> argument also increases computation time and if I am reinventing the wheel
> it would be nice to know if I am overthinking things.
>
> Sincerely,
> Russell
>
> [[alternative HTML version deleted]]
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] Using bind.tip and lapply on a multiPhylo object

2021-06-02 Thread Russell Engelman
Dear R-sig-phylo,

I have been working with a mammalian phylogeny I recently downloaded from
VertLife (http://vertlife.org/phylosubsets/). Unfortunately, the phylogeny
is missing a large number of species, so I am trying to manually add these
taxa to the phylogeny. I have a series of 100 trees that I am using to do
things such as test for phylogenetic signal. I know how to use bind.tip to
add new taxa to a single tree, but I am having more trouble with a
multiPhylo object. I am primarily adding these taxa by placing them as
sister to their nearest included relative (since most of them are elevated
former subspecies), but the issue here is that in the 100 trees in the
multiPhylo object the node representing the taxon to bind these taxa to is
not the same across all trees due to shifting topologies.

This is an example of the code I have been using, in which "tree" is the
tree object. This works for a single 'phylo' tree but not 'multiphylo'.

```
newtree<-lapply(tree,bind.tip,tip.label="Cercopithecus_albogularis",
position=0.59,edge.length = 0.59,

where=mrca(tree)["Cercopithecus_mitis","Cercopithecus_mitis"])
```

Now, this code will not work, but I know exactly why: 'tree' is a
multiPhylo object and so the 'where' argument cannot find the node for the
terminal taxon. However, the issue is how can I tell R to repeat this
'where' argument for each of the 100 trees, since the node in question is
not identical across these trees? Is there an easier way to do this than
using the 'mrca' call for each terminal taxon? I've noticed adding a 'mrca'
argument also increases computation time and if I am reinventing the wheel
it would be nice to know if I am overthinking things.

Sincerely,
Russell

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/