from:"Liam J. Revell"

Re: [R-sig-phylo] size-free morphometrics in phylogenetic framework

2009-04-06 Thread Liam J. Revell


Hi Dan,

Even though this is well-trodden ground in some respects (by the likes of 
TG and others), I have often-enough fielded questions about 
size-correction and principal components analysis on species data (or 
encountered those who see nothing wrong with ignoring phylogeny in said 
procedures) that I recently wrote and submitted a Brief Communication to 
'Evolution' on the topic entitled Statistical transformations and trees: 
Phylogenetic size-correction and principal components.  In the article, I 
provide the mathematical details of these procedures, as well as simple R 
and Matlab code in an appendix.  I also analyzed the variance and type I 
error associated with ignoring phylogeny in these preliminary corrections 
or data reduction procedures.  Although the effect is not incredibly 
severe, it is large enough to be of significant concern.  I'm happy to 
send this manuscript to anyone who is interested.  As I said, it is 
presently in review, but I'm encouraged by this discussion thread that 
there seems to be considerable interest in this area.


I also performed my type I error analyses using stochastic pure-birth 
trees.  Obviously, the effect on type I error will be increased for more 
realistic birth-death trees (in which there are more shorter branches 
towards to tips of the tree).


Thanks! - Liam

Liam J. Revell
Department of Organismic and Evolutionary Biology
Harvard University
web: http://anolis.oeb.harvard.edu/~liam/
email: lrev...@fas.harvard.edu

On Mon, 6 Apr 2009, Luke Harmon wrote:


You should reply to the list.

Sent from my iPod

Begin forwarded message:


From: Brian O'Meara bcome...@nescent.org
Date: April 6, 2009 9:04:00 AM PDT (CA)
To: Dan Rabosky dl...@cornell.edu
Cc: r-sig-phylo@r-project.org
Subject: Re: [R-sig-phylo] size-free morphometrics in phylogenetic 
framework




Hi, Dan et al. I've been worried about this for PCA. Mesquite has an 
evolutionary PCA approach, but as far as I know, this hasn't been 
published anywhere. Assuming the method implemented in Mesquite is done 
properly (which I think rather likely), it'd be relatively simple to 
compare the effect of the evolutionary PCA vs conventional approach on real 
and simulated datasets.


Brian

On Apr 6, 2009, at 11:53 AM, Dan Rabosky wrote:



Howdy folks-

I think I'm opening this up for discussion, rather than pointing to a
specific problem.

Suppose I'm conducting an analysis of phenotypic evolution and need
size-free species measurements. A standard approach might be to take
species mean values, then regress those values on some index of size
(e.g., snout-vent length), and treat the residuals as size-free
measurements for further analysis. However, I've also seen explicit
incorporation of phylogenetic info into the estimation of size-free
variables. For example, Collar et al (Evolution, online early, DOI:
10./j.1558-5646.2009.00626.x) used PIC to estimate regression
slopes of traits on size, then forced these slopes on regressions of
raw species trait values and used the residuals as size-free variables.

Do we have strong feelings about why/whether the latter is the
approach that should be used? My intuition, probably incorrect, is
that the slopes should be relatively robust regardless of whether
phylogeny is used, but not the degrees of freedom.

And if this is a serious problem, wouldn't PCA and geometric
morphometric approaches also be affected by the assumption that
species values represent independent data points?

~Dan

Cornell University
http://www.eeb.cornell.edu/Rabosky/dan/main.html





  [[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Trait simulations

2009-05-19 Thread Liam J. Revell


Hi Jeremy,

Perhaps you've already figured this out.  There may be a function 
available in R to do this, but if not, in principle for BM it should be 
fairly easy to do this manually.


First, read your tree in:


tree-read.tree(FILE, etc.) % put filename or paste in tree


then compute the phylogenetic VCV matrix (let's call it T):


T-vcv.phylo(tree)


then specify your desired correlation matrix, R.  If you wanted an 
evolutionary correlation of 0.95 this would be:



R-matrix(c(1,0.95,0.95,1),nrow=2,ncol=2)


then generate a matrix of random, uncorrelated values (U), e.g.:


U-matrix(,nrow=N,ncol=2) % where N is the number of taxa
U[,1]-rnorm(N) % (am I using this function correctly?)
U[,2]-rnorm(N)


now give them a correlation based on R (using the Cholesky decomposite):


V-U%*%chol(R)


now give the data correlation due to the tree:


X-t(V)%*%chol(T)


I think that should be it.  The data in X should be as data evolved by BM 
on your tree with the correlation specified in your matrix, R (0.95 in 
this case).


More experience users, please let me know if this is incorrect (it is 
untested, so it may include errors, but I think the logic is sound).


Sincerely, Liam

Liam J. Revell
Department of Organismic and Evolutionary Biology
Harvard University
web: http://anolis.oeb.harvard.edu/~liam/
email: lrev...@fas.harvard.edu

On Mon, 18 May 2009, jeremy.beaul...@yale.edu wrote:


Hi all~

I was just wondering if there is a package or function in R that can simulate
two continuous traits with a user-specified correlation coefficient using a
known tree topology, branch lengths, and a model of Brownian motion (or even
OU, if possible)?

All the best,

Jeremy Beaulieu

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Standardising independent contrasts

2009-09-15 Thread Liam J. Revell


The contrasts returned by pic() should already be standardized.

Standardizing phylogenetic contrasts is performed by dividing each 
contrast (calculated between internal or terminal nodes) by a sum 
proportional to the square root of its expected variance.  Assuming 
Brownian motion, for two sister terminal nodes, this is just the square 
root of the sum of their branch lengths to a common ancestor.  (This is 
because the variance of a difference between two random variables is just 
the sum of their separate variances, and variance is accumulates in 
direct proportion to elapsed time under BM.)  For contrasts that are 
deeper in the phylogeny, this sum has to be adjusted to account for 
uncertainty in the reconstructed values at internal nodes, which adds 
extra variance to the computed difference.  This is described in great 
detail in Felsenstein (1985; Am Nat 125:1-15).  However, as noted above - 
the values returned by pic() should have already been standardized in this 
way!


You should also not subtract the mean value of the contrasts.  This is 
stated explicitly in Garland et al. (1992; Syst Biol 41:18-32): The 
expected mean of any set of contrasts is zero because the direction of 
subtraction is arbitrary..., so only standard deviations are needed.


- Liam

Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu/~liam/
NEW email: lrev...@nescent.org

On Tue, 15 Sep 2009, Manabu Sakamoto wrote:


Dear list,

I am trying to standardise independent contrasts following Garland et
al. (1992), but I am unsure just how you would extract the sum of it's
[contrast's] branch lenghts in R.

I computed independent contrasts using the pic function in ape. I
then took the contrasts and subtracted the mean from each one and now I
need to divide them by their standard deviations or the square root of
the sum of their branch lengths (Garland et al., 1992). This is where I
am stuck. I assume the sum of the branch lengths here refer to the sum
of the two branch lengths that go into the computation of each contrast?

I would greatly appreciate any kind support on this matter.

Many thanks in advance,
Manabu Sakamoto

--
--
M. Sakamoto, PhD
Department of Earth Sciences,
University of Bristol,
Wills Memorial Building, Queen's Road,
Bristol BS8 1RJ, UK
m.sakam...@bristol.ac.uk

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Standardising independent contrasts

2009-09-16 Thread Liam J. Revell


Hi Manabu,

Yes, positivized is the same as taking the absolute value.

For some of Garland et al.'s diagnostic tests on contrasts you might 
need the expected variances of the contrasts (actually, proportional 
to their expected variance prior to standardization), or the 
unstandardized contrasts.  It is possible to obtain these using specific 
settings in the pic() function.  For example, to obtain the expected 
variances you just run pic(var.contrasts=TRUE), e.g.,

 cont.X-pic(x,tree,var.contrasts=TRUE)
where x is a vector of your phenotypic trait and tree is your phylogeny. 
cont.X is a matrix with the standardized contrasts in the first column 
and variances in the second column.  Similarly:

 cont.x-pic(x,tree,scaled=FALSE)
will return a vector (cont.x) containing the unstandardized contrasts 
(although in most cases you don't want these).


If you compute:
 cont.X-pic(x,tree,scaled=FALSE,var.contrasts=TRUE)
 cont.x-cont.X[,1]/sqrt(cont.X[,2])
you will get the same vector of contrasts as you would obtain with:
 cont.x-pic(x,tree)

- Liam

Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu/~liam/
NEW email: lrev...@nescent.org



Manabu Sakamoto wrote:

Hi Liam,

Thanks very much. Now, that makes sense that pic() contrasts are 
already standardised, because I was getting strange results compared 
to Mesquite when I tried to further standardise the contrasts...


As I am at it, I thought perhaps you could help me interpret what 
Garland et al. (1992) referred to when they discuss positivizing 
stardised contrasts. I thought perhaps it was just simply taking the 
absolute values of the contrasts because the signs are arbitrary but I 
was not sure because of the specific wording in that paper...


Many thanks,
Manabu

--
M. Sakamoto, PhD
Department of Earth Sciences,
University of Bristol,
Wills Memorial Building, Queen's Road,
Bristol BS8 1RJ, UK
m.sakam...@bristol.ac.uk


Liam J. Revell さんは書きました:

The contrasts returned by pic() should already be standardized.

Standardizing phylogenetic contrasts is performed by dividing each 
contrast (calculated between internal or terminal nodes) by a sum 
proportional to the square root of its expected variance. Assuming 
Brownian motion, for two sister terminal nodes, this is just the 
square root of the sum of their branch lengths to a common ancestor. 
(This is because the variance of a difference between two random 
variables is just the sum of their separate variances, and variance 
is accumulates in direct proportion to elapsed time under BM.) For 
contrasts that are deeper in the phylogeny, this sum has to be 
adjusted to account for uncertainty in the reconstructed values at 
internal nodes, which adds extra variance to the computed difference. 
This is described in great detail in Felsenstein (1985; Am Nat 
125:1-15). However, as noted above - the values returned by pic() 
should have already been standardized in this way!


You should also not subtract the mean value of the contrasts. This is 
stated explicitly in Garland et al. (1992; Syst Biol 41:18-32): The 
expected mean of any set of contrasts is zero because the direction 
of subtraction is arbitrary..., so only standard deviations are needed.


- Liam

Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu/~liam/
NEW email: lrev...@nescent.org

On Tue, 15 Sep 2009, Manabu Sakamoto wrote:


Dear list,

I am trying to standardise independent contrasts following Garland et
al. (1992), but I am unsure just how you would extract the sum of it's
[contrast's] branch lenghts in R.

I computed independent contrasts using the pic function in ape. I
then took the contrasts and subtracted the mean from each one and now I
need to divide them by their standard deviations or the square root of
the sum of their branch lengths (Garland et al., 1992). This is where I
am stuck. I assume the sum of the branch lengths here refer to the sum
of the two branch lengths that go into the computation of each 
contrast?


I would greatly appreciate any kind support on this matter.

Many thanks in advance,
Manabu Sakamoto

--
--
M. Sakamoto, PhD
Department of Earth Sciences,
University of Bristol,
Wills Memorial Building, Queen's Road,
Bristol BS8 1RJ, UK
m.sakam...@bristol.ac.uk

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Standardising independent contrasts

2009-09-16 Thread Liam J. Revell

Ted Garland made the very good point to me that positivization is not 
the same as taking the absolute value of contrasts when several traits are 
being considered.  If, for example, one is inclined to positivize the 
contrasts for two traits with respect to trait x, then a pair of contrasts 
(x,y) = (-1,2) will become (1,-2) - not (1,2).  The same effect as 
positivization can be achieved by rotating the descendant branches of a 
node so that all contrasts in a given trait are positive (obviously, in 
this case some or many contrasts for another trait can be negative).


To my knowledge, however, diagnostic tests on contrasts usually use 
absolute values (or squares) of contrasts.  In general, these should not 
be used in statistical analyses such as correlation (although positivized 
contrasts may be used with no ill effect).


- Liam

Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu/~liam/
NEW email: lrev...@nescent.org

On Wed, 16 Sep 2009, Liam J. Revell wrote:


Hi Manabu,

Yes, positivized is the same as taking the absolute value.

For some of Garland et al.'s diagnostic tests on contrasts you might 
need the expected variances of the contrasts (actually, proportional 
to their expected variance prior to standardization), or the 
unstandardized contrasts.  It is possible to obtain these using specific 
settings in the pic() function.  For example, to obtain the expected 
variances you just run pic(var.contrasts=TRUE), e.g.,

 cont.X-pic(x,tree,var.contrasts=TRUE)
where x is a vector of your phenotypic trait and tree is your phylogeny. 
cont.X is a matrix with the standardized contrasts in the first column 
and variances in the second column.  Similarly:

 cont.x-pic(x,tree,scaled=FALSE)
will return a vector (cont.x) containing the unstandardized contrasts 
(although in most cases you don't want these).


If you compute:
 cont.X-pic(x,tree,scaled=FALSE,var.contrasts=TRUE)
 cont.x-cont.X[,1]/sqrt(cont.X[,2])
you will get the same vector of contrasts as you would obtain with:
 cont.x-pic(x,tree)

- Liam

Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu/~liam/
NEW email: lrev...@nescent.org



Manabu Sakamoto wrote:

Hi Liam,

Thanks very much. Now, that makes sense that pic() contrasts are 
already standardised, because I was getting strange results compared 
to Mesquite when I tried to further standardise the contrasts...


As I am at it, I thought perhaps you could help me interpret what 
Garland et al. (1992) referred to when they discuss positivizing 
stardised contrasts. I thought perhaps it was just simply taking the 
absolute values of the contrasts because the signs are arbitrary but I 
was not sure because of the specific wording in that paper...


Many thanks,
Manabu

--
M. Sakamoto, PhD
Department of Earth Sciences,
University of Bristol,
Wills Memorial Building, Queen's Road,
Bristol BS8 1RJ, UK
m.sakam...@bristol.ac.uk


Liam J. Revell さんは書きました:

The contrasts returned by pic() should already be standardized.

Standardizing phylogenetic contrasts is performed by dividing each 
contrast (calculated between internal or terminal nodes) by a sum 
proportional to the square root of its expected variance. Assuming 
Brownian motion, for two sister terminal nodes, this is just the 
square root of the sum of their branch lengths to a common ancestor. 
(This is because the variance of a difference between two random 
variables is just the sum of their separate variances, and variance 
is accumulates in direct proportion to elapsed time under BM.) For 
contrasts that are deeper in the phylogeny, this sum has to be 
adjusted to account for uncertainty in the reconstructed values at 
internal nodes, which adds extra variance to the computed difference. 
This is described in great detail in Felsenstein (1985; Am Nat 
125:1-15). However, as noted above - the values returned by pic() 
should have already been standardized in this way!


You should also not subtract the mean value of the contrasts. This is 
stated explicitly in Garland et al. (1992; Syst Biol 41:18-32): The 
expected mean of any set of contrasts is zero because the direction 
of subtraction is arbitrary..., so only standard deviations are needed.


- Liam

Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu/~liam/
NEW email: lrev...@nescent.org

On Tue, 15 Sep 2009, Manabu Sakamoto wrote:


Dear list,

I am trying to standardise independent contrasts following Garland et
al. (1992), but I am unsure just how you would extract the sum of it's
[contrast's] branch lenghts in R.

I computed independent contrasts using the pic function in ape. I
then took the contrasts and subtracted the mean from each one and now I
need to divide them by their standard deviations or the square root of
the sum of their branch lengths (Garland et al., 1992). This is where I
am stuck. I assume the sum of the branch lengths here refer to the sum

Re: [R-sig-phylo] inverting C matrices from Yule trees

2009-11-04 Thread Liam J. Revell


Hi Santiago,

The problem is that this function, birth.death(,,taxa.stop=X), simulates 
speciation and extinction under a Yule process until it gets to taxa=X 
and then stops, without adding any length to the branches leading to the 
last two taxa.  This means that the distances from the root of the tree 
to taxa 9  10 and their common ancestor are all equal and the matrix 
returned by vcv.phylo() is exactly singular.


A solution is to simulate a tree with taxa=X+1 and then drop the last 
tip using the function drop.tip().  In your example this would be:


 solve(vcv.phylo(drop.tip(birthdeath.tree(1,0, taxa.stop=11),11)));

This seems to work.

- Liam

Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu/~liam/
NEW email: lrev...@nescent.org



Santiago Claramunt wrote:

Hi all,

I'm having problems computing the inverse of the expected 
variance-covariance evolutionary matrix (C) on simulated trees 
generated with birthdeath.tree().


 solve(vcv.phylo(birthdeath.tree(1,0, taxa.stop=10)))

These matrices seem to be singular!
Matrices generated using rtree() and rcoal() are fine.

Any insights on this?

Cheers,

Santiago


Santiago Claramunt
Museum of Natural Science,
119 Foster Hall,
Louisiana State University,
Baton Rouge, LA70803
scla...@tigers.lsu.edu
http://www.museum.lsu.edu/Claramunt/Home.html

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] correlation between discrete and continuous variables?

2009-12-01 Thread Liam J. Revell


Hi Christoph, Stacey,

This might be a problem for which a logistic regression 
(http://en.wikipedia.org/wiki/Logistic_regression) would be 
appropriate.  If so, I encourage you to check out the following article, 
which I enjoyed:


Ives, A. R., and T. Garland Jr. 2009. Phylogenetic logistic regression 
for binary dependent variables. Systematic Biology (Advance Access). 
URL: http://sysbio.oxfordjournals.org/cgi/content/abstract/syp074v1.


Others on this list-serve might also be well suited to comment on this 
paper  method.


Sincerely, Liam

Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu/~liam/
NEW email: lrev...@nescent.org



sd...@duke.edu wrote:

Hi Christoph,
There was some discussion following a similar question a year ago:

The question:
https://stat.ethz.ch/pipermail/r-sig-phylo/attachments/20080402/2e2b995e/attachment.pl 



The threads:
https://stat.ethz.ch/pipermail/r-sig-phylo/2008-April/thread.html#35

I'll be interested to hear what approach you take, so let us (or me 
anyway)

know.

Stacey

Quoting Christoph Heibl christoph.he...@gmx.net:


A question to the comparative method gurus:

I have a phylogeny for a group of plants that repeatedly evolved  
annuality from perennial ancestors. For each tip in the tree I have  
characterized the climatic tolerances (e.g. temperature,  
precipitation, ...)

.
I would like to assess if there is a significant correlation between  
life cycle (discrete) and climate variables (continuous).


I am not sure how to start with this and would appreciate your opinion!

Regards,
Christoph

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Estimate correlation coefficient of a linear GLS model

2010-01-04 Thread Liam J. Revell

Thanks.  That is very helpful information, actually!
- Liam

Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu
NEW email: lrev...@nescent.org

-Original Message-
From: Emmanuel Paradis emmanuel.para...@mpl.ird.fr
Sent: Monday, January 04, 2010 7:51 AM
To: Liam J. Revell lrev...@nescent.org
Cc: Christian Arnold chrarn...@web.de; r-sig-phylo@r-project.org
Subject: Re: [R-sig-phylo] Estimate correlation coefficient of a linear GLS 
model

Liam J. Revell wrote on 28/12/2009 04:06:
 Hi Christian,
 
 There was a previous discussion on the coefficient of determination 
 (R^2) for PGLS.  It can be found in the Oct-2009 R-sig-phylo archive, 
 here: https://stat.ethz.ch/pipermail/r-sig-phylo/2009-October/thread.html
___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Transforming a data.frame into a phylogenetic tree

2010-01-27 Thread Liam J. Revell


Hi Timothee,

Assuming that I figured out your tree format correctly, I have written 
you a function to translate your tree-as-data-frame into an object of 
class phylo in R.  I attach the disclaimer that this is programmed 
very inelegantly and may contain errors.  It also contains no method for 
checking your tree file for errors, thus will probably crash if your 
tree doesn't make any sense (say, for example, contains floating 
branches not connected to any other nodes).  It assumes that you have 
installed APE and that, for example (taking your table, below), node 
 is the ancestor to node 1000 and 0001 (as well as 
several other nodes) with the branch length of 27 leading to node 
1000 and 83 leading to 0001.  If this is the correct 
interpretation of your format, the original data file, below, yields a 
Newick tree of ((5:78,9:140):27,11:185,12:196,10:176,7:159); where the 
taxon labels are determined by the row labels of the data frame.  (Note 
that this tree is highly polytomous at the root, because a number of 
nodes in the data frame, e.g., node 1001, leave one and only one 
descendant.)


The input is just the table given below, with the lines above Anc 
OffSpr. . . etc. excluded, which you can read in as follows:

 tree.data-read.table(filename,colClasses=character)

Then you can load the function from source, as follows:
 source(tree.read.R) # this name is just to distinguish it from 
read.tree(), an ape function.


And run it as follows:
 tree-tree.read(tree.data)

Trees can be printed to file using, for example:
 write.tree(tree,filename)

The function is as follows:

tree.read-function(tree.data){

   n.tips=0; k=1; tip.label-NA;
   edge-matrix(0,nrow(tree.data)-1,2); edge.length-NA;
   for (i in 1:nrow(tree.data)){
   external=1;
   for (j in 1:nrow(tree.data)){
   if(tree.data[i,2]==tree.data[j,1]){
   external=0;
   }
   }
   n.tips-n.tips+external;
   if(external==1){   
   edge[i-1,2]-n.tips;

   tip.label[k]-row.names(tree.data[i,]);
   k=k+1;
   }
   }
   node.number-n.tips;
   for (i in 2:nrow(tree.data)){
   for(j in 1:2){
   if(edge[i-1,j]==0){
   internal-tree.data[i,j];
   node.number-node.number+1;
   for(k in i:nrow(tree.data)){
   for(l in 1:2){
   if(tree.data[k,l]==internal){
   edge[k-1,l]=node.number;
   }
   }
   }
   }   
   # find branch length

   if(j==2){
   anc.height=0;
   for(k in 1:nrow(tree.data)){
   if(tree.data[i,1]==tree.data[k,2]){
   anc.height-as.numeric(tree.data[k,3]);
   }
   }
   edge.length[i-1]-as.numeric(tree.data[i,3])-anc.height;
   }
   }
   }

   Nnode-node.number-n.tips;
  
   
obj-list(edge=edge,tip.label=tip.label,edge.length=edge.length,Nnode=Nnode);

   class(obj) - phylo;
   obj-collapse.singles(obj);

   obj;
}

I hope this works!

- Liam

Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu/~liam/
NEW email: lrev...@nescent.org



Timothee POISOT wrote:

Dear users,

I am currently doing a model of community assembly, and I would like to
integrate phylogenetic information. For each organism, I know its ancestor
and the time at which speciation occured. These informations are stored in
a data.frame, an example of which is here :

  

PhyloTree


Anc   OffSpr TimeApp
1   Anc    1
2   1000  28
3   0001  84
4   0010 101
5  1000 1010 106
6   0010 139
7   0100 160
8  0001 1001 161
9  1000 1100 168
10 0010 0011 177
11 1001 10010001 186
12 0010 0110 197

Do you have any idea about how I can proceed to transform this object into
a tree object, such as the one used by APE?

Regards,

Timothée

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] transition matrix in geiger (sim.char)

2010-02-03 Thread Liam J. Revell


Hi Alex,
The columns in your matrix should sum to 0.0, so if you want the 
transitions to any of the four states to be equiprobable, try:


 q-list(rbind(c(-1, 1/3, 1/3, 1/3),
   c(1/3, -1, 1/3, 1/3),
   c(1/3, 1/3, -1, 1/3),
   c(1/3, 1/3, 1/3, -1)));

Then type:

 results-sim.char(tree,q,nsims=1,model=discrete,root.state=1);

Note that you don't have to set the root.state, but if not assigned it 
will assume a root.state of 1.


If you want to simulate many characters at once under the same model, 
you can do that in one of two ways, depending on how you want the output 
to be stored.  You can increase nsims, e.g.:


 results-sim.char(tree,q,nsims=10,model=discrete,root.state=1);

for 10 simulations.  Or, you can add additional matrices to the list of 
transition matrices, q, e.g.:


 for (i in 2:10){
   q[[i]]=q[[1]];
}

The latter gives your results in a more convenient form, from which:

 results-as.matrix(results[,,1]);

puts all the results from all 10 simulations into one numspecies x 10 
matrix.


Good luck!

- Liam

Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu/~liam/
NEW email: lrev...@nescent.org



Alexandre Antonelli wrote:

Hi,

   I am trying to use the function sim.char in geiger without success.  
I would like to let a character (in this case 'species distribution',  
with four states) evolve along a tree, and try different transition  
costs in the matrix.


  However, I don't manage to get more than one character with 2  
states. Any ideas what I am doing wrong? It is very unclear to me how  
the transition matrix should be compiled (if columns/rows have to sum  
up to one, etc). The manual indicated that the number of states per  
character would be dependent on the size of the matrix, but that does  
not seem to be the case.


  These are the commands I used, and the transition matrix I intended  
to apply:


## 0: Area A, 1:Area B, 2:Area C, 3:Area D
##
##  From0   1   2   3
## To   0   -.5 .5  .5  .5
##  1   .5  -.5 .5  .5
##  2   .5  .5  -.5 .5  
##  3   .5  .5  .5  -.5


require (geiger)

tree - read.nexus(file=input.tre)

q-list (rbind(  c(-.5, .5, .5, .5),
c(.5, -.5, .5, .5),
c(.5, .5, -.5, .5),
c(.5, .5, .5, -.5)))

results-sim.char(tree, q, model=discrete,n=2)

--
Thanks for any help!

Best wishes,

Alex


Dr Alexandre Antonelli
Institute of Systematic Botany
Zollikerstrasse 107
CH 8008  Zürich
Switzerland
Phone: +41 (0)44 634 8416
Mobile: + 41 (0) 76 7345116
alexandre.antone...@systbot.uzh.ch
http://www.systbot.uzh.ch/Personen/PostdoktorandInnen/AlexandreAntonelli.html
**


[[alternative HTML version deleted]]

  



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] selecting terminal branch lengths?

2010-02-08 Thread Liam J. Revell


Try this:

# given an object of class phylo called tree

# 1. create vector for terminal edge lengths
terminal.edges-matrix(NA,length(tree$tip.label));

# 2. copy terminal branches into vector
terminal.edges-tree$edge.length[tree$edge[1:(2*length(tree$tip.label)-2),2]=length(tree$tip.label)]; 



# 3. label terminal branches by tip label
names(terminal.edges)-tree$tip.label;

I think that should work.

- Liam

Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu/~liam/
NEW email: lrev...@nescent.org



mgavil2 wrote:

Dear All

I am trying to find a way to get a vector of only the terminal branch length
for species in a phylogeny, but cant seem to find a way

any suggestions?

Best,

Maria Mercedes



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] estimate likelihood of data for given model and parameters

2010-03-19 Thread Liam J. Revell


Hi Carl (also Brian),

This can effectively be accomplished in the geiger function 
fitContinuous() by setting the upper  lower bounds for the parameter to 
both be equal to the value for which you want to calculate the 
likelihood.  Unfortunately, these are not allowed to be exactly the 
same; however, they can be arbitrarily close.  So, for example, given 
tree=tree, data=x, and a desired BM rate parameter of 10, you can compute:


 
results-fitContinuous(tree,x,model=BM,bounds=list(beta=c(10.,10.0001))); 
# for example


You can also just calculate the log-likelihood directly as follows:

 sig2-desired.value; # set your desired value for the BM rate, say 
sig2-10; as before.
 C-vcv.phylo(tree); C-C[names(x),names(x)]; # compute VCV for tree, 
and assure that the rows  columns are in the same order as x

 n-nrow(C); ones-matrix(1,n,1); # compute some preliminaries
 
a-solve(t(ones)%*%solve(sig2*C)%*%ones)%*%t(ones)%*%solve(sig2*C)%*%x; 
# estimate ancestor
 
logL--(1/2)*(t(x-ones%*%a)%*%solve(sig2*C)%*%(x-ones%*%a))-(n/2)*log(2*pi)-(1/2)*log(det(sig2g*C)); 
# compute the log-likelihood


Hope this is helpful!

- Liam

Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu/~liam/
NEW email: lrev...@nescent.org



Carl Boettiger wrote:

Dear r-sig-phylo,

I'd like to be able to estimate the likelihood of a model with a given
Brownian rate parameter under a given data set.  For OU models, this can be
accomplished through the ouch package with the flag fit=FALSE.  However
there does not seem to be a corresponding option for Brownian motion in
ouch/geiger/ape algorithms for fitting a Brownian motion model?  Perhaps
I've overlooked something?

-Carl




___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Phylogenetic ANOVA

2010-07-26 Thread Liam J. Revell

Just to clarify the point - the null distribution for the test-statistic 
(F) in this method is generated by Brownian motion simulation on the 
phylogeny.  The P-value of the ANOVA is thus obtained by comparing the 
observed test-statistic to simulated test-statistics (for an arbitrarily 
large number of simulations).  The reference to this is:


Garland, T. Jr., A. W. Dickerman, C. M. Janis,  J. A. Jones. 1993. 
Phylogenetic analysis of covariance by computer simulation. Syst. Biol. 
42: 265-292. (http://www.jstor.org/stable/2992464)


Thus, if you have a test-statistic (F) more extreme then that obtained 
for every last one of your simulated datasets, then the P-value will be 
entirely determined by the number of simulations that are used (as Luke 
says).  This seems to be case for your data (not surprising given the 
very large values for F that were obtained).


- Liam

Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu/~liam/
NEW email: lrev...@nescent.org



Luke Harmon wrote:

Yes that's a direct result of the number of simulations - if all of the 
simulated F statistics are smaller than the test statistics, then you will get:

p = 1/(n+1) where n is the number of simulated data sets.

lh
On Jul 26, 2010, at 8:44 AM, Alejandro Gonzalez V wrote:

  

Hello,

Some colleagues and I are running some phylogenetic ANOVAS using the geiger 
package. In some of the analyses we get the same phylogentic p-value (very 
small p-value) even though the F-statistic differs between the two analyses, 
albeit it being relatively high in both instances. We were wondering why this 
arises, to get better grip on how the analysis works. We thought it may have to 
do with the randomizations to calculate the phylogenetic p-value. Or that the 
F-statistics are quite high...
Below are two examples :

m11-phy.anova(tree1,tmax,biozone,data.names=X,nsim=1000)
Standard ANOVA:
Analysis of Variance Table

Response: td$data
 Df Sum Sq Mean Sq F valuePr(F)
group  1 967.96  967.96  155.88 3.057e-12 ***
Residuals 25 155.246.21  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 



Phylogenetic p-value:   0.000999001


m12-phy.anova(tree1,wt,biozone,data.names=X,nsim=1000)
Standard ANOVA:
Analysis of Variance Table

Response: td$data
 Df Sum Sq Mean Sq F valuePr(F)
group  1 602.88  602.88  109.01 1.333e-10 ***
Residuals 25 138.265.53  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 



Phylogenetic p-value:   0.000999001


Cheers,

Alejandro
__

Alejandro Gonzalez Voyer
Post-doc

NEW ADDRESS  NEW E-MAIL

Estación Biológica de Doñana (CSIC)
Avenida Américo Vespucio s/n
41092 Sevilla 
Spain


E-mail: alejandro.gonza...@ebd.csic.es

Tel: +34- 954 466700, ext 1749

Website (From my previous position): 
http://www.iee.uu.se/zooekol/default.php?type=personalpagelang=enid=146









[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



Luke Harmon
Assistant Professor
Biological Sciences
University of Idaho
208-885-0346
lu...@uidaho.edu

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Question regarding simulate character option in geiger

2010-08-04 Thread Liam J. Revell


Hi Alejandro.

Your code won't work because model.matrix in sim.char() is the 
variance-covariance matrix for the traits, not the species.


What you can do instead is the following:

 cp-corMartins(2.5,tree,fixed=T)
 C-corMatrix(Initialize(cp,data))
 rownames(C)-tree$tip.label # naming the rows and columns 
ensures that the data vector will have labeled rows

 colnames(C)-rownames(C)
 x-chol(C)%*%rnorm(length(tree$tip.label))

Or alternatively you can do the following using sim.char():

 s-matrix(1,1,1)
 x-sim.char(ouTree(tree,alpha=2.5),model.matrix=s,model=brownian)   
 # for instance


There are also several other related transformations, such as 
lambdaTree() and deltaTree.  These can be reviewed by querying:


 ?ouTree

Hope this is of some help.

Sincerely, Liam

Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu/~liam/
NEW email: lrev...@nescent.org



Alejandro Gonzalez V wrote:

Hello,

I have a question about the simulate character command in geiger. I want to 
generate simulations of trait evolution under different models to estimate 
variance around the estimate of evolutionary parameters for that trait. I've 
seen in previous posts in the list that characters can be simulated under 
distinct evolutionary models by transforming the phylogenetic tree prior to the 
simulation. I was wondering if the same could be done by instead adjusting the 
model matrix. Specifically I was wondering if ape's corPagel or corMartins 
could be used to specify different matrices adjusted to distinct values of 
lambda or alpha instead of transforming the tree. Would this be an appropriate 
alternative to tree transformation prior to simulations?
For example I was thinking of doing something of this sort for a model with 
alpha of 2.5 with a phylo object called tree and table with a trait called data:

cp-corMartins(2.5, tree, fixed=TRUE)
C-corMatrix(Initialize(cp, data))
sim.char(tree, model.matrix=C, nsims=100, model=brownian)

Cheers,


Alejandro

__

Alejandro Gonzalez Voyer
Post-doc

NEW ADDRESS  NEW E-MAIL

Estación Biológica de Doñana (CSIC)
Avenida Américo Vespucio s/n
41092 Sevilla 
Spain


E-mail: alejandro.gonza...@ebd.csic.es

Tel: +34- 954 466700, ext 1749

Website (From my previous position): 
http://www.iee.uu.se/zooekol/default.php?type=personalpagelang=enid=146









[[alternative HTML version deleted]]

  



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] retrieving simulated ancestral states from sim.char{geiger}

2010-08-04 Thread Liam J. Revell

Hi all. 

Here is some code that I think will do what Dr. Felsenstein has 
suggested for a single character.  It returns a vector containing the 
simulated states at all internal or terminal nodes, numbered according 
to the numbers used in tree$edge.  Happy to hear if this works.  See below:


# conducts BM simulation by ascending through the tree from the root 
node to the tips

# by Liam J. Revell
sim.bm-function(tree,bm.rate,root.state){

   # generate random normal deviates
   dev-rnorm(n=length(tree$edge.length))

   # give them appropriate variances
   dev-dev*sqrt(bm.rate*tree$edge.length)

   # sort the tree edges, edge.lengths, and devs
   ranks-order(tree$edge[,1])
   tree$edge-tree$edge[ranks,]
   tree$edge.length-tree$edge.length[ranks]
   dev-dev[ranks]

   # add them up
   state-matrix(0,length(tree$edge.length),1);
   for(j in 1:length(tree$edge.length)){
   if(tree$edge[j,1]==length(tree$tip)+1)
   state[j]-root.state+dev[j]
   else
   state[j]-state[match(tree$edge[j,1],tree$edge[,2])]+dev[j]
   }
  
   # bookkeeping

   rownames(state)-tree$edge[,2]
   state-state[order(as.numeric(rownames(state))),]
   sim.bm-state

}

Sincerely, Liam

Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu/~liam/
NEW email: lrev...@nescent.org



Joe Felsenstein wrote:

Luke wrote:

  
The simulations in Geiger do draw from a mvn distribution but - in an 
unnecessary step - draw random numbers for every branch and use matrix 
multiplication to get the tip values. This is kind of dumb and needlessly 
slow but does give you the right answer. Carl is correct that simply drawing 
from a mvn distribution is faster and equivalent. If I had time I'd rewrite 
the function - maybe I will!



Without knowing exactly what your R code is doing, I just want to add to
these (correct) ideas that there is one simple way of getting tip and
interior node values without much more effort.   Work up the tree,
drawing a set of normals (one for each character) to get the net changes in
that branch.   Keep going until you reach the tips.   (If the characters
are not independent, you also need to have on hand a matrix which
contains the square root of the covariances among characters, and multiply
the values at each node by that.)

This way you don't need a nodes x nodes covariance matrix.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Plotting trees - 'main' does not seem to work

2010-12-29 Thread Liam J. Revell

Hi Steve,

I have found the same thing with plot.phylo().  The same effect can be 
accomplished by the following two commands:

  plot(tree) # tree is your phylo object
  title(desired title) # desired title is what you would normally 
give to main

You may have already figured this out, but in case not - I thought I 
would pass it along.

- Liam

Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu/~liam/
email: lrev...@nescent.org, (new) liam.rev...@umb.edu
(new) blog: http://phytools.blogspot.com


On 12/29/2010 12:42 PM, Steve L wrote:
 I've tried plotting the example trees in the ape package as well as my own
 using the 'main' parameter to set the title of my plots, but no matter what
 I set the 'main' parameter to, it doesn't print any text above the plot.  Is
 this a bug?

 Steve

   [[alternative HTML version deleted]]

 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Cant read PHYLIP tree

2011-01-21 Thread Liam J. Revell

This makes sense as I was also able to read in the file using 
read.tree() with no difficulty running in Windows. - Liam

On 1/21/2011 10:46 AM, Leonor Palmeira wrote:
 On 21/01/11 16:39, Chris Mcowen wrote:
 Thanks for this, it works now. I wonder how that got there? It isn't 
 in the nexus format? Very odd!

 Also,

 Enrico Rezende

 Chris,
 FYI, I opened your file in R with the read.tree command and had no 
 problems to visualise your tree.


 tree- read.tree(tree.txt)
 tree

 Phylogenetic tree with 1325 tips and 1248 internal nodes.

 Tip labels:
 Acanthochlamys_bracteata, Barbacenia_albiflora, 
 Barbacenia_pallida, Barbacenia_riedeliana, Barbacenia_trigona, 
 Freycinetia_awaiarensis, ...

 Rooted; includes branch lengths.


 Enrico

 How could he open it in this format?

 It might be related to the system under which you're running R? I am 
 running it under Linux and have the same error as you. Enrico might be 
 running it under Windows, which might ignore this character.

 Sorry that I don't have any more insight on this matter.

 Leo.

 Thanks

 On 21 Jan 2011, at 15:36, Leonor Palmeira wrote:

 Hi,

 this is just due to a unallowed character in your file in 
 Ptychosperma_harannii. If you change this, it should run fine.

 Leo.

 On 21/01/11 16:00, Chris Mcowen wrote:
 Hi,

 I am using a function that requires a tree in PHYLIP format. I had 
 it in nexus ( which read in fine using the read.nexus function) i 
 then opened it in figtree and output it as phylip.

 However when i try and open it i get this error:

 read.tree(/Users/chrismcowen/Desktop/tree.txt)
 NULL
 Warning message:
 In strsplit(tree, NULL) : input string 1 is invalid in this locale


 I can open it in figtree fine, so i am unsure why i cant in R?

 any help would be greatly appreciated.

 p.s - the tree is attached.

 Chris



 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



-- 
Liam J. Revell
NESCent, Duke University
web: http://anolis.oeb.harvard.edu/~liam/
NEW email: lrev...@nescent.org


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] R: Re: R: Re: Simulation of Continuous Trait with Active Trend

2011-02-17 Thread Liam J. Revell


Hi Pasquale,

I think this may be possible if all tips are not contemporaneous.

As Gene points out, the values at the tips of the tree given a trend 
should be distributed as ~MVN(mu=diag(vcv(tree))*tr+ancestor, 
V=vcv(tree)*sig2).  Thus, we should be able to compute the likelihood of 
any data pattern at the tips and internal nodes of the tree, given tr, 
ancestor, and sig2.  If we start with only the values at the tips and 
the phylogeny, we can estimate the states at internal nodes and our 
model by maximizing the likelihood.  This would be our solution.


I have written down some code for this using dmnorm() (from the mnormt 
package) to compute the multivariate normal density; a home-made 
function to compute vcv(tree) but for all the nodes in the tree; and 
optim() to maximize the likelihood - but it doesn't seem to work very 
well yet.  I will pass it along if I can figure it out.


- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://www.bio.umb.edu/facstaff/faculty_Revell.html
(new) email: liam.rev...@umb.edu
(new) blog: http://phytools.blogspot.com

On 2/17/2011 12:22 PM, pasquale.r...@libero.it wrote:



Hi all,


to feed Dave's questions; I have body size data (showing high
phylogenetic signal) and a trend verified independently (Cope's).
Although all species in the tree are in fact fossil species, I derived
from it a number of trees which are time-interval phylogenies,
restricted to species living in a particular temporal interval. As such,
interval phylogenies are ultrametric. I'm trying ape's yule.cov function
to test whether body size did influence diversification, yet yule.cov
requires aces which are calculated by assuming brownian motion, which is
wrong given I know there is a real trend in the data. This is why I
asked if it is possible to calculate aces by assuming BM with a positive
(mu0) trend.

Any hint?

Pas



Messaggio originale
Da: dwba...@uchicago.edu
Data: 17/02/2011 18.04
A: pasquale.r...@libero.itpasquale.r...@libero.it, R Sig Phylo
Listservr-sig-phylo@r-project.org, Liam J.
Revellliam.rev...@umb.edu, Gene Hunthu...@si.edu
Ogg: Re: [R-sig-phylo] R: Re: Simulation of Continuous Trait with
Active Trend

All-
Thanks for all the alternative ways to simulate trends!

I'd say estimating ancestral traits always depends on the question
you are after. If we are dealing with traits that may be evolving
under an active trend, than we know reconstruction could be
inaccurate because ancestral states are simply a weighted average of
the observed values. Whether that invalidates ones approach depends
on the question, how one deals with uncertainty in the ancestral
trait estimates and the data itself (i.e. How many fossil taxa do
you have? Do you have good a priori evidence for a particular root
state? Does the trait have a high or low level of phylogenetic
signal, as this influences the uncertainty?).

I think Finarelli and Flynn have a good point that including a large
sample of fossil taxa can help quite a bit in constraining our
understanding of active trend patterns, but I think we (as
paleontologists) need to give a great deal of consideration to how
we want to treat lineages in the fossil record (as internal nodes or
as terminal taxa) and how we time-scale our trees.
-Dave


On Thu, Feb 17, 2011 at 6:19 AM, pasquale.r...@libero.it
mailto:pasquale.r...@libero.it pasquale.r...@libero.it
mailto:pasquale.r...@libero.it wrote:


Hi all,

I have a simple question. Is it possible, or even feasible, to
run ancestral
state estimation under Brownian Motion with a trend, as modelled
by Emmanuel,
Gene and Liam? I wonder whether this is feasible with real data.
Thanks all
Pas




 Messaggio originale
 Da: emmanuel.para...@ird.fr mailto:emmanuel.para...@ird.fr
 Data: 17/02/2011 6.13
 A: Hunt, Genehu...@si.edu mailto:hu...@si.edu
 Cc: R Sig Phylo Listservr-sig-phylo@r-project.org
mailto:r-sig-phylo@r-project.org
 Ogg: Re: [R-sig-phylo] Simulation of Continuous Trait with
Active Trend
 
 Hi all,
 
 In rTraitCont, the argument model can be a function, eg:
 
 f - function(x, l) x + rnorm(1, 1, sqrt(l * 0.1))
 
 which is a way to model a trend. Here's an example of what you
can do:
 
 tr - rbdtree(.1, .05)
 tr - makeNodeLabel(tr)
 x - rTraitCont(tr, model = f, ancestor = TRUE)
 bt - branching.times(tr)
 plot(-bt, x[names(bt)])
 
 You may add the tip values with:
 
 n - Ntip(tr)
 points(rep(0, n), x[1:n], pch = 2)
 
 You can also draw lines linking the ancestors with their
descendants in
 this morpho-time-space since they can

Re: [R-sig-phylo] R: Re: R: Re: Simulation of Continuous Trait with Active Trend

2011-02-21 Thread Liam J. Revell


Hi Pasquale,

To follow up on my earlier message, I have written a preliminary version 
of a simple function to simultaneously estimate the evolutionary 
parameters and ancestral states for Brownian evolution with a trend 
using likelihood.  I will paste it at the end of this email.  I have 
confirmed that the function returns the same trend parameter as 
fitContinuous(model=trend) and the same ancestral character estimates 
as ace() - when mu (the trend parameter) is forced to zero [note that 
this can only be done by modifying the code].  The parameter sig2, the 
Brownian rate, is biased by a factor n/(2n-2) for n species relative to 
fitContinuous(model=trend)$Trait1$beta.


Note that this model can generally only be fit for a tree with tips that 
are not contemporaneous - for instance, for a phylogeny containing 
fossil species.


Best, Liam

Function code (also 
http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/anc.trend/v0.1/anc.trend.R):


# written by Liam J. Revell 2011
anc.trend-function(phy,x,maxit=2000){

# preliminaries
# require dependencies
require(ape)
# set global
tol-1e-8
# compute C
D-dist.nodes(phy)
ntips-length(phy$tip.label)
Cii-D[ntips+1,]
C-D; C[,]-0
counts-vector()
for(i in 1:nrow(D)) for(j in 1:ncol(D))
C[i,j]-(Cii[i]+Cii[j]-D[i,j])/2
dimnames(C)[[1]][1:length(phy$tip)]-phy$tip.label
dimnames(C)[[2]][1:length(phy$tip)]-phy$tip.label
C-C[c(1:ntips,(ntips+2):nrow(C)),c(1:ntips,(ntips+2):ncol(C))]
# sort x by phy$tip.label
x-x[phy$tip.label]

# function returns the negative log-likelihood
likelihood-function(theta,x,C){
a-theta[1]
u-theta[2]
sig2-theta[3]
y-theta[4:length(theta)]

logLik-dmnorm(x=c(x,y),mean=(a+diag(C)*u),varcov=sig2*C,log=TRUE)
return(-logLik)
}

# get reasonable starting values for the optimizer
a-mean(x)
sig2-var(x)/max(C)

# perform ML optimization

result-optim(par=c(a,0,sig2,rep(a,phy$Nnode-1)),likelihood,x=x,C=C,method=L-BFGS-B,lower=c(-Inf,-Inf,tol,rep(-Inf,phy$Nnode-1)),control=list(maxit=maxit))

# return the result
	ace-c(result$par[c(1,4:length(result$par))]); 
names(ace)-c(as.character(phy$edge[1,1]),rownames(C)[(length(phy$tip.label)+1):nrow(C)])


return(list(ace=ace,mu=result$par[2],sig2=result$par[3],logL=-result$value,convergence=result$convergence,message=result$message))

}

--
Liam J. Revell
University of Massachusetts Boston
web: http://www.bio.umb.edu/facstaff/faculty_Revell.html
(new) email: liam.rev...@umb.edu
(new) blog: http://phytools.blogspot.com

On 2/17/2011 12:45 PM, Liam J. Revell wrote:

Hi Pasquale,

I think this may be possible if all tips are not contemporaneous.

As Gene points out, the values at the tips of the tree given a trend
should be distributed as ~MVN(mu=diag(vcv(tree))*tr+ancestor,
V=vcv(tree)*sig2). Thus, we should be able to compute the likelihood of
any data pattern at the tips and internal nodes of the tree, given tr,
ancestor, and sig2. If we start with only the values at the tips and the
phylogeny, we can estimate the states at internal nodes and our model by
maximizing the likelihood. This would be our solution.

I have written down some code for this using dmnorm() (from the mnormt
package) to compute the multivariate normal density; a home-made
function to compute vcv(tree) but for all the nodes in the tree; and
optim() to maximize the likelihood - but it doesn't seem to work very
well yet. I will pass it along if I can figure it out.

- Liam



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] PGLS, polytomies, degrees of freedom and multiple specimens

2011-02-22 Thread Liam J. Revell


Hi Popko,

My sense is that if the polytomy is hard, it is never necessary to 
subtract a degree of freedom.  If the polytomy is firm (sensu Purvis  
Garland 1993) - in that the true tree is fully bifurcating but we cannot 
resolve a multifurcating node due primarily to rapid divergence of the 
descendant taxa - then subtracting a degree of freedom will probably be 
excessively conservative.  If the polytomy is truly soft (sensu Maddison 
1989; Garland  Diaz-Uriarte 1999), then subtracting a degree of freedom 
for each extra furc will probably also be conservative.  [As Rohlf 
2001 points out, this would imply zero degrees of freedom if the 
phylogeny was completely unresolved (p. 2154), which seems excessive.] 
Some of these issues are discussed in Garland  Diaz-Uriarte (1999; 
Syst. Biol.) and elsewhere.


I'm sure that other subscribers to the list could add significant 
insight to this issue beyond what I am able to offer.


- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://www.bio.umb.edu/facstaff/faculty_Revell.html
(new) email: liam.rev...@umb.edu
(new) blog: http://phytools.blogspot.com

On 2/22/2011 5:31 AM, Popko Wiersma wrote:

Dear all,

Can somebody tell whether one should subtract degrees of freedom when
applying PGLS with a phylogeny containing soft polytomies? And does it
make a difference if polytomies originate from data from multiple
specimens of species?

cheers, Popko Wiersma

- I reposted this message because the first attempt did not show the
text directly in the message body. Apologies for any confusion -

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Problems with transferring trees back into ape from Mesquite

2011-03-03 Thread Liam J. Revell

To collapse nodes you could use the function di2multi() in ape which 
collapses all internodes with a length below tol (specified by the 
user).  If your tree doesn't have branch lengths, you could just set all 
the edge lengths of the edges you want to keep to 1, and all the other 
ones to 0, and then use di2multi().


- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
(new) email: liam.rev...@umb.edu
(new) blog: http://phytools.blogspot.com

On 3/3/2011 3:12 PM, sd...@duke.edu wrote:

I have had similar challenges with going back and forth between
Macclade/Mesquite and R and FigTree. I got similar messages about the
root and
I only remember that I ended up using PAUP and TreeEdit to solve the
issues. I
wish I could remember more specifics but I did root and unroot the thing
and try
exporting as different file types. Eventually one of the programs
reformatted
the newick file and fixed the root issue so that it worked for FigTree and
other downstream programs (I think the tree was destined for raxML to be
used
as a constraint tree).
But I'm surprised R doesn't have functions for collapsing nodes..
Stacey

Quoting David Bapst dwba...@uchicago.edu:


Hello all,

I recently decided to manually edit a very large supertree (1700
taxa) in Mesquite (I had to collapse a few nodes into polytomies,
something I can only seem to do in Mesquite). I read it out of R as a
Newick file, than into Mesquite, where I did my little edits and then
saved the tree as a Nexus file, which I then tried to read into ape.
But then I get this.


spec_tree-read.nexus(file.choose())

Error in if (sum(tr$edge[, 1] == ROOT) == 1  dim(tr$edge)[1]  1) { :
missing value where TRUE/FALSE needed

I also tried reading the Nexus file into TreeFig, which didn't work.
If I cut out a lot of the internal structure of the Nexus file (who
needs Mesquite and Taxon blocks anyway?), I can get TreeFig to read
it, export it into a Newick file or a new Nexus file and then try
opening it in R with read.nexus(). That works. But then i can't write
it in a new file in R with write.tree() nor write.nexus(), and also
plot(tree) causes R to freeze.


write.tree(spec_id_tree,file.choose())

Error in kids[[parent[i]]] : attempt to select less than one element


write.nexus(spec_id_tree,file.choose())

#NEXUS
[R-package APE, Mon Feb 28 10:03:59 2011]

Read 1756 items
Error in 1:(start - 1) : argument of length 0

After saving and opening the file in Mesquite a bunch of times, I
repeatedly found new un-collapsed double node branches on the tree. No
idea where they came from (maybe a by-product of my node collapsing?).
I removed the new ones that popped up, deassigned branch lengths to
the tree (it's a super-cladogram) and then saved and... finally, R
read the nexus file and write.tree() worked. write.nexus() didn't work
(same error as above), but I can read my tree and write it, at least,
so my issue is solved but I'd still like to know what was causing the
problems to begin with. I have no idea where the un-collapsed double
branches came from or why they were causing this issue.

I am using R v2.12.2 and ape v2.6-3. So that the error is completely
reproducible, I have placed two Nexus files on my website which
produce both the read.nexus() error (trash_a.nex) and the
write.nexus() and write.tree() errors (trash_b). These are small,
partially collapsed portions of my supertree with the species names
deleted. They can be found at:

http://home.uchicago.edu/~dwbapst/trash_a.nex

http://home.uchicago.edu/~dwbapst/trash_b.nex

-Dave


--
David Bapst
Dept of Geophysical Sciences
University of Chicago
5734 S. Ellis
Chicago, IL 60637
http://home.uchicago.edu/~dwbapst/

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Pruning a tree

2011-03-04 Thread Liam J. Revell


Hi Eugen,

Did you try my suggestion?

In your case, if the species you want to keep are in a row separated 
text file, with a header (species), first read them in:


 species.to.keep-read.table(file=species.list.file,header=T)

Now, for phylo object tree, type:

 pruned.tree-drop.tip(tree,tree$tip.label[-match(species.to.keep[,1], 
tree$tip.label)])


- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
(new) email: liam.rev...@umb.edu
(new) blog: http://phytools.blogspot.com

On 3/4/2011 6:52 AM, Eugen Egorov wrote:

First of all, thank you for helping :)

Well, this didn't work out, although it should work perfectly. When I
type in nc - name.check(tree, data) and then nc I get all tree
species (over 3000 species) in the $Tree.not.data part. In the $Data.not
tree part I see 281 numbers from 1 to 281, this is the number of
species in my data file. Seems like R doesnt recognize the species names
in the data file.

The data file is a .csv file and looks like this:

species
Species_1
Speceis_2
Species_3

Species_281

so its one column and nothing else. What should I do?

Thanks a lot


- Original Message - From: Graham Slater gsla...@ucla.edu
To: Eugen Egorov eugen...@online.de
Cc: r-sig-phylo@r-project.org
Sent: Thursday, March 03, 2011 7:54 PM
Subject: Re: [R-sig-phylo] Pruning a tree


drop.tip assumes you have identified the tips that you want to remove,
which you could do using

nc - name.check(tree, data).
newtree - drop.tip(tree, nc[[1]])

or

newtree - drop.tip(tree, nc$Tree.not.data) # note that Tree has a caps
and not using this could cause a weird tree with no tips to be output.


But if your data are simply a list of names (and I assume here you mean
you have a vector of names rather than an actual list) and you don't
know exactly which species are missing then the following might be easier:


missing - tree$tip.label[is.na(match(tree$tip.label, listofnames))] ##
will use the match() function to identify the tips that are not present
in your list - it essentially is trying to match the tip names from the
tree to your list and if there is no match it reports NA for that tip.
we get the tip names corresponding to those NAs here

newtree - drop.tip(tree, missing) # this will remove those tips


graham
On Mar 3, 2011, at 9:15 AM, Eugen Egorov wrote:


Hi all,

I have a huge tree and a list with species. Now I want to prune the
tree, so only species appearing in the list are left in the tree. I
tried the geiger package to compare tree species with those in the
list, but that didn't work out, because I recieved a tree with 0 tips
and 1 node after using drop.tip$tree.not.data. I guess I have to
format the list, but I don't know in which way. Any idea how I do that?

greets

Eugen
[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



Graham Slater Ph. D.
Department of Ecology and Evolutionary Biology
University of California, Los Angeles
610 Charles E. Young Drive South
Los Angeles, CA 90095-1606

(424) 442-4348
gsla...@ucla.edu
www.eeb.ucla.edu/gslater

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Pruning a tree

2011-03-04 Thread Liam J. Revell


Hi Eugen.

name.check() uses the row names of your data matrix, so that's why it 
works now.


The error message that was returned by the -match() method suggests that 
you have species in your to include file that are not actually in the 
tree (thus match() returns one or multiple NAs).  I did not anticipate 
this.  To get around it, just do:


 
pruned.tree-drop.tip(tree,tree$tip.label[-na.omit(match(species.to.keep[,1],tree$tip.label))])


- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
(new) email: liam.rev...@umb.edu
(new) blog: http://phytools.blogspot.com

On 3/4/2011 1:08 PM, Eugen Egorov wrote:


Well, Liams suggestion produced an error message only zeros are allowed
to be mixed with negative indices. I just tried a bunch of command
lines from various internet sites and came up with what seems to be a
solution. Here the skript:

tree-read.tree(tree.tre)

data-read.csv(data.csv)

data.frame(data[,1])-data1

rownames(data1)-data[,1]

attach(data1)

name.check(tree, data1) - overlap

comptree-drop.tip(tree, overlap$Tree.not.data)


...and this works, but I don't know why exactly...seems like rownames is
an imprtant factor, cause it doesn't work without that command
line...for whatever reason...

Does that make any sense?


- Original Message - From: Graham Slater gsla...@ucla.edu
To: Liam J. Revell liam.rev...@umb.edu
Cc: Eugen Egorov eugen...@online.de; r-sig-phylo@r-project.org
Sent: Friday, March 04, 2011 6:33 PM
Subject: Re: [R-sig-phylo] Pruning a tree


Yes, name.check() will only work with a named vector of data or data
frame, so Liam's code should work for you. it also looks like your names
in the vector species are different from those of the tip labels, as
you say that nc$Tree.not.data gives you a bunch of numbers but your
vector is made of Species underscore number, e.g. Species_1. If that's
the case, either change your file of species names to be just numbers,
or use the following to change your tree tip labels before using liam's
code.


tree$tip.label - paste(Species, tree$tip.label, sep = _)



Graham Slater
Department of Ecology and Evolutionary Biology
University of California, Los Angeles
621 Charles E Young Drive South
Los Angeles
CA 90095-1606

(310) 825-4669
gsla...@ucla.edu
www.eeb.ucla.edu/gslater






On Mar 4, 2011, at 7:45 AM, Liam J. Revell wrote:


Hi Eugen,

Did you try my suggestion?

In your case, if the species you want to keep are in a row separated
text file, with a header (species), first read them in:

 species.to.keep-read.table(file=species.list.file,header=T)

Now, for phylo object tree, type:


pruned.tree-drop.tip(tree,tree$tip.label[-match(species.to.keep[,1],
 tree$tip.label)])

- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
(new) email: liam.rev...@umb.edu
(new) blog: http://phytools.blogspot.com

On 3/4/2011 6:52 AM, Eugen Egorov wrote:

First of all, thank you for helping :)

Well, this didn't work out, although it should work perfectly. When I
type in nc - name.check(tree, data) and then nc I get all tree
species (over 3000 species) in the $Tree.not.data part. In the $Data.not
tree part I see 281 numbers from 1 to 281, this is the number of
species in my data file. Seems like R doesnt recognize the species names
in the data file.

The data file is a .csv file and looks like this:

species
Species_1
Speceis_2
Species_3

Species_281

so its one column and nothing else. What should I do?

Thanks a lot


- Original Message - From: Graham Slater gsla...@ucla.edu
To: Eugen Egorov eugen...@online.de
Cc: r-sig-phylo@r-project.org
Sent: Thursday, March 03, 2011 7:54 PM
Subject: Re: [R-sig-phylo] Pruning a tree


drop.tip assumes you have identified the tips that you want to remove,
which you could do using

nc - name.check(tree, data).
newtree - drop.tip(tree, nc[[1]])

or

newtree - drop.tip(tree, nc$Tree.not.data) # note that Tree has a caps
and not using this could cause a weird tree with no tips to be output.


But if your data are simply a list of names (and I assume here you mean
you have a vector of names rather than an actual list) and you don't
know exactly which species are missing then the following might be
easier:


missing - tree$tip.label[is.na(match(tree$tip.label, listofnames))] ##
will use the match() function to identify the tips that are not present
in your list - it essentially is trying to match the tip names from the
tree to your list and if there is no match it reports NA for that tip.
we get the tip names corresponding to those NAs here

newtree - drop.tip(tree, missing) # this will remove those tips


graham
On Mar 3, 2011, at 9:15 AM, Eugen Egorov wrote:


Hi all,

I have a huge tree and a list with species. Now I want to prune the
tree, so only species appearing in the list are left in the tree. I
tried the geiger package to compare tree species

Re: [R-sig-phylo] Dealing with Bounded Trait Measures

2011-03-05 Thread Liam J. Revell

I'm not sure that this is what Dave has in mind, but if anyone is 
interested in simulating bounded evolution in R, I just added it to my 
fastBM() function (code here: 
http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/fastBM/v0.3/fastBM.R).


In the process of evolving traits up the tree, I just bounce back any 
phenotypes that exceed the lower or upper boundary conditions specified 
by the user (by default they are -Inf and Inf).  I think I did this 
properly.  Feedback welcome though.


- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
(new) email: liam.rev...@umb.edu
(new) blog: http://phytools.blogspot.com

On 3/5/2011 12:55 PM, tgarl...@ucr.edu wrote:

Hello David, Enrico, et al.,

I may have lost track of what Dave was originally trying to do, and I am not 
familiar with all of the options presently available in r for simulating 
continuously valued traits along a specified phylogenetic tree.  However, I 
wanted to point out that MANY possibilities, including trends, the OU process, 
and actual limits to trait evolution implemented in several ways, are available 
in our original DOS program PDSIMUL.EXE that accompanies this paper:

Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. 
Phylogenetic analysis of covariance by computer simulation. Systematic Biology 
42:265-292.

It has been used many times to look at trends, limits, etc., e.g., in these 
papers:

Díaz-Uriarte, R., and T. Garland, Jr. 1996. Testing hypotheses of correlated 
evolution using phylogenetically independent contrasts: sensitivity to 
deviations from Brownian motion. Systematic Biology 45:27-47.
Laurin, M. 2010. Assessment of the relative merits of a few methods to detect 
evolutionary trends. Syst. Biol. 59:689-704.

Cheers,
Ted



Theodore Garland, Jr.
Professor
Department of Biology
University of California, Riverside
Riverside, CA 92521
Office Phone:  (951) 827-3524
Lab Phone:  (951) 827-5724
Home Phone:  (951) 328-0820
Facsimile:  (951) 827-4286 = Dept. office (not confidential)
Email:  tgarl...@ucr.edu

Main Departmental page:
http://www.biology.ucr.edu/people/faculty/Garland.html

List of all Publications:
http://www.biology.ucr.edu/people/faculty/Garland/GarlandPublications.html

Garland and Rose, 2009
http://www.ucpress.edu/books/pages/10604.php


    Original message 

 Date: Sat, 05 Mar 2011 15:36:13 +0100
 From: Enrico Rezendeenrico.reze...@uab.cat
 Subject: Re: [R-sig-phylo] Dealing with Bounded Trait Measures
 To: David Bapstdwba...@uchicago.edu
 Cc: R Sig Phylo Listservr-sig-phylo@r-project.org

 David,
 on the top of my head, if no species measurement strictly
 corresponds to
 zero, you may log-transform the data. You may then simulate
 Brownian
 motion in log-transformed values, which will correspond to a
 boundary of
 zero in a linear scale (i.e., the more negative the log number, the
 closer the trait value is to zero - but never zero - in a linear
 scale).
 This also explains why you can simulate the evolution of body mass
 employing Brownian motion in log-transformed units and no species
 will
 ever be assigned a body mass of zero. On more speculative grounds,
 this
 may simply reflect the fact that many biological processes and
 their
 regulation occur in a multiplicative, not additive, scale.
 
 The problem with regards to this approach is that you cannot really
 have
 any species with a trait = 0 given that the log-transformation is
 impossible in this case, so you might add some constant in case
 this
 occurs (caution because the constant would be arbitrary and might
 have
 an impact on the outcome of analyses). Did not think about this for
 too
 long, though.
 
 Hope this helps,
 Enrico
 
 
 
 
 
 El 4/3/11 9:14 p.m., David Bapst escribió:
   All-
   As far as I understand it, the vast majority of continuous
 character
   analyses assume that the trait is distributed normally and
 without
   bounds. Is there an appropriate transformation to for
 measurements of
   a trait that does have one or more bounds and where some taxa
 actually
   are at that bound? I have several traits where the bound is zero,
 and
   some taxa are actually at zero for this trait. (A practical
 example is
   'spine length', where some taxa have virtually no spine.) And if
 there
   is no transformation applicable, is it analytically appropriate
 to
   remove taxa that have 'zero units' for that trait? Must we
 convert
   these traits to discrete categories to deal with them at all?
 
   As always, I appreciate your advice.
   -Dave Bapst, UChicago
 
 
 
 --
 
 Enrico L. Rezende

Re: [R-sig-phylo] The Speed of Different Methods For Obtaining the Descendant Tip Taxa per Node

2011-03-06 Thread Liam J. Revell


Hi Dave.

It seems like one way to get these values faster would be to count the 
number of descendants as the tree is read in to R.  This is possible 
because when the ) character is reached by the Newick parser, all 
descendant nodes (and tips) have already been created in memory.


This was simple to add to my tree reading function read.newick() [code 
here: 
http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/read.newick/v0.3/read.newick.R].  
Now v0.3 of the function returns a modified phylo object with the 
vector $Ndesc containing the number of descendants from each internal 
and terminal node above the root (in the order of $edge[,2]).  This is 
exactly the same vector as would be obtained by the following commands:


 desc-sapply(tree$edge[,2],function(x) node.leaves(tree,x))
 Ndesc-sapply(desc,function(x) length(x))

Note that read.newick() only reads one tree at a time in phylip format 
(and without node labels) - but reading multiple trees or node labels, 
or reading from nexus format, would quite easy to add if you find that 
this is indeed faster.


In addition, I believe that newick2phylog() creates this list in reading 
a Newick tree into memory as a phylog object.  This is stored as 
phy$Aparam$nlea for phylog object phy.  Unfortunately, it seems like 
newick2phylog() is very slow for large trees.


- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com


On 3/6/2011 12:47 PM, David Bapst wrote:

Hello all,
I'm currently trying to measure a parameter over a large number of
large trees (1700 tips), and part of this calculation requires
knowing the tip taxa descended from each node (I must compare the
difference in tip values among taxa descended from a node). Because I
must do this many times,  I decided to compare the efficiency of
several methods for doing this in various R libraries with
system.time() (as Liam did recently with some BM simulation functions
in some recent blog posts). As I feel that others may benefit from
this comparison of methods, I am posting the results to this list.

Geiger's node.leaves() is much faster than the alternatives, although
at ~13s a run, it is still not a particularly speedy process. I didn't
need the actual tip.labels, so I took node.leaves() and made it as
lean as possible, to produce node.tips(), below. That cut the run time
down to ~9 sec.

node.tips-function (phy, node){
n- length(phy$tip.label)
if (node= n){node}else{
l- numeric()
d- phy$edge[which(phy$edge[,1]==node),2]
for(j in d){if(j= n){l- c(l, j)}else{l-c(l, 
node.tips(phy,j))}}
l}}

If anyone knows of another alternative that might be faster, I would
greatly appreciate any suggestions.
-Dave Bapst, UChicago Geosci

Using the modified node.leaves() function from geiger, node.tips()


system.time(desc-sapply(edge_end,function(x) node.tips(res_tree,x)))

user  system elapsed
9.390.009.44

Using node.leaves() in geiger


system.time(desc-sapply(edge_end,function(x) node.leaves(res_tree,x)))

user  system elapsed
   13.290.02   13.34

Using Descendants() in phangorn


system.time(desc-sapply(edge_end,function(x) Descendants(res_tree,x)))

user  system elapsed
   75.600.10   76.83

Using listTips() in adephylo


system.time(desc_list-c(as.list(1:Ntip(res_tree)),listTips(res_tree)))

user  system elapsed
   75.272.93   87.28

Using descendants() in phylobase


system.time(desc-sapply(edge_end,function(x) 
descendants(res_tree4,x,type=tips)))

user  system elapsed
  149.560.67  155.15

Using nodeDecendants() in maticce  (note that translating the tree
into ouchtree format was prohibitively very lengthy)


system.time(res_tree_ou-ape2ouch(res_tree))

user  system elapsed
   84.010.13   85.34

system.time(desc-sapply(edge_end,function(x) nodeDescendents(res_tree_ou,x)))

user  system elapsed
   65.910.23   68.47



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] reverse order plotting of newick tree/phylo object

2011-03-17 Thread Liam J. Revell


Hi Thierry,

There might be a more elegant way to do this, but you can just apply the 
ape function rotate() to each node number of the tree (excluding tips).


I.e.

 tr2-tree
 for(i in length(tr2$tip)+1:tr2$Nnode) tr2-rotate(tr2,i)
 plot(tr2)

[rotate() may also be able to take a vector of nodes, but I was not able 
to get this to work.]


- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 3/17/2011 10:51 AM, Thierry Janssens - TNW wrote:

Dear R-sig-phylo,



I am looking for a method to plot an unrooted tre/phylo object e in the
reverse order (of the tip labels). Like all the nodes would have
rotated.



Any of you has an idea?



Kind regards,



Thierry



Thierry Janssens

Postdoctoral researcher

Delft University of Technology

Bionanoscience

Kavli Institute of Nanoscience

Lorentzweg 1

2628LJ Delft

the Netherlands

Tel: +31 15 2781175

Fax:+31 15 2781202

e-mail: t.k.s.janss...@tudelft.nlmailto:t.k.s.janss...@tudelft.nl




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] reverse order plotting of newick tree/phylo object

2011-03-17 Thread Liam J. Revell

Dan is right on - and I also suspect that the issue of the non-rotating 
node is probably due to a polytomy at that node.


Thanks for the insight Dan.

- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 3/17/2011 12:26 PM, Dan Rabosky wrote:



But can you explain to me what is the rationale behind this? There are
only 47 nodes and 54 tips. How can the nodes from 55 to 47 than be
rotated?



Liam's code is rotating nodes 55+(1:47), rather than 55 to 47. The internal node indexing 
in ape's phylo class starts with number of tips plus 1. Thus, using rotate on 
nodes starting with length(phy$tip.label) + 1 means it is starting with node 56 (the root 
node) and going through length(phy$tip.label) + phy$Nnode, e.g.,, all internal nodes. You 
can see that

55+(1:47)

gives a vector of node numbers that is not the same as 55 to 47.

If it isn't working right - maybe it is that you have polytomies? I think you 
should have n-1 internal nodes (53 in your case). Try multi2di (from ape) for 
polytomy resolution and see if it works as desired.

~Dan Rabosky





Kind regards,

Thierry

Thierry Janssens
Postdoctoral researcher
Delft University of Technology
Bionanoscience
Kavli Institute of Nanoscience
Lorentzweg 1
2628LJ Delft
the Netherlands
Tel: +31 15 2781175
Fax:+31 15 2781202
e-mail: t.k.s.janss...@tudelft.nl

-Original Message-
From: Liam J. Revell [mailto:liam.rev...@umb.edu]
Sent: donderdag 17 maart 2011 16:11
To: Thierry Janssens - TNW
Cc: r-sig-phylo@r-project.org
Subject: Re: [R-sig-phylo] reverse order plotting of newick tree/phylo
object

Hi Thierry,

There might be a more elegant way to do this, but you can just apply the
ape function rotate() to each node number of the tree (excluding
tips).

I.e.


tr2-tree
for(i in length(tr2$tip)+1:tr2$Nnode) tr2-rotate(tr2,i)  plot(tr2)


[rotate() may also be able to take a vector of nodes, but I was not able
to get this to work.]

- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 3/17/2011 10:51 AM, Thierry Janssens - TNW wrote:

Dear R-sig-phylo,



I am looking for a method to plot an unrooted tre/phylo object e in
the reverse order (of the tip labels). Like all the nodes would have
rotated.



Any of you has an idea?



Kind regards,



Thierry



Thierry Janssens

Postdoctoral researcher

Delft University of Technology

Bionanoscience

Kavli Institute of Nanoscience

Lorentzweg 1

2628LJ Delft

the Netherlands

Tel: +31 15 2781175

Fax:+31 15 2781202

e-mail: t.k.s.janss...@tudelft.nlmailto:t.k.s.janss...@tudelft.nl




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo







[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Branch and Bound Maximum Parsimony (Matthew Vavrek)

2011-03-28 Thread Liam J. Revell


Matthew.

The only thing that I would add is that if you *really* want to do an 
exhaustive search in R, and your species number is small enough to 
permit an exhaustive search (i.e., =10), then it is straightforward 
enough to do so:


 require(phangorn)
 data-read.phyDat(file=filename,type=USER) # for binary data
 all.trees-allTrees(n=length(data),tip.label=names(data),rooted=FALSE)
 pscores-vector()
 for(i in 1:length(all.trees))
pscores[i]-parsimony(all.trees[[i]],data)
 minscore-min(pscores); mp.tree-all.trees[pscores==minscore]

mp.tree will be a single MP tree or a list if there are several MP trees.

Of course this will take a long time for more than a 7 or 8 species, and 
will not work at all for more than 10 species.  It's also possible that 
an exhaustive search in a traditional phylogeny inference program like 
PAUP* might be faster if, for instance, PAUP* retains the score for 
parts of the tree that don't change among a set of trees - instead of 
recalculating it each time anew.  That I don't know.


- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 3/28/2011 10:58 AM, Ross Mounce wrote:

Dear Matthew,


Branch and Bound searching is often unneccessary, are you sure you need to do 
this? With enough random addition sequences (relative to dataset size) you 
*will* find all MPTs.

PAUP* despite being familiar to many of us and very fully-featured, is 
definitely ungainly for modern analyses with it's lack of parallelisation.

Have you considered using TNT? http://www.cladistics.com/aboutTNT.html
Wiki Manual here: http://tnt.insectmuseum.org/index.php/Main_Page

Many paleontologists are starting to use this - computationally it's far more 
efficient.
Just remember, as with any and all phylogenetic analyses it's garbage in, garbage 
out: you must know what you're doing and why *before* you start your analysis.


Kind Regards and good luck,


Ross Mounce




___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Adding a taxon to a pre-exisiting tree

2011-04-01 Thread Liam J. Revell


Hi Matthew.

I don't doubt that other members of the list have better suggestions, 
but it is possible to add a tip in all possible places using bind.tree() 
in ape.


For instance, starting with a random unrooted tree with, say, 4 taxa:

tree-rtree(n=4,rooted=FALSE,br=rep(1,5))
# create a 5th species, here t5, to add as a phylo object
# [I don't think this can be avoided with bind.tree()]
new.tip-list(edge=matrix(c(2,1),1,2),tip.label=t5,edge.length=1,Nnode=1)
class(new.tip)-phylo
# add the new tip to all edges of the tree
trees-list(); class(trees)-multiPhylo
for(i in 1:nrow(tree$edge))
trees[[i]]-bind.tree(tree,new.tip,where=tree$edge[i,2],position=0.5)
# now plot them to see what we have done
plot(trees,type=unrooted,use.edge.length=F)

Of course I would be happy to see a more elegant solution!

Sincerely, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 4/1/2011 11:02 PM, Matthew Vavrek wrote:

Hello again all,
I got a number of great solutions to my last question about branch and
bound maximum parsimony searches, several of which had definitely not
crossed my mind. However, being stubborn (or is that stupid? Hard to
tell sometimes) I went ahead and have started putting together a branch
and bound style search function for R, just because (I'll try to get it
up on CRAN shortly). It works, however it's not as efficient as I think
it probably could be, probably because the method I use to add new taxa
to the tree involves text searches with grep to create Newick trees. All
that being said, is there any way to take an existing tree (in any
format, such as Newick, but also the edge lists like in an ape 'phylo'
object) and add a taxon at all the possible positions? I know allTrees()
exists, but that would give me all the possibilities, rather than a
restricted set (as I would need for branch and bound).

Thanks again
Matthew

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Adding a taxon to a pre-exisiting tree

2011-04-01 Thread Liam J. Revell

Of course, we could generalize my preceding suggestion with the 
following function:


add.everywhere-function(tree,tip.name){
   tree-unroot(tree)
   tree$edge.length-rep(1,nrow(tree$edge))
   new.tip-list(edge=matrix(c(2,1),1,2),tip.label=tip.name,
 edge.length=1,Nnode=1)
   class(new.tip)-phylo
   # add the new tip to all edges of the tree
   trees-list(); class(trees)-multiPhylo
   for(i in 1:nrow(tree$edge)){
  trees[[i]]-bind.tree(tree,new.tip,where=tree$edge[i,2],
position=0.5)
  trees[[i]]$edge.length-NULL
   }
   return(trees)
}

Try it:

tree-read.tree(text=((George,Paul),(Ringo,John));)
trees-add.everywhere(tree,Pete_Best)
plot(trees,type=unrooted)


--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 4/1/2011 11:33 PM, Liam J. Revell wrote:

Hi Matthew.

I don't doubt that other members of the list have better suggestions,
but it is possible to add a tip in all possible places using bind.tree()
in ape.

For instance, starting with a random unrooted tree with, say, 4 taxa:

tree-rtree(n=4,rooted=FALSE,br=rep(1,5))
# create a 5th species, here t5, to add as a phylo object
# [I don't think this can be avoided with bind.tree()]
new.tip-list(edge=matrix(c(2,1),1,2),tip.label=t5,edge.length=1,Nnode=1)
class(new.tip)-phylo
# add the new tip to all edges of the tree
trees-list(); class(trees)-multiPhylo
for(i in 1:nrow(tree$edge))
trees[[i]]-bind.tree(tree,new.tip,where=tree$edge[i,2],position=0.5)
# now plot them to see what we have done
plot(trees,type=unrooted,use.edge.length=F)

Of course I would be happy to see a more elegant solution!

Sincerely, Liam



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Converting from DNAbin to a matrix

2011-04-07 Thread Liam J. Revell


Hi Josh,

Try doing this when you read in the data:

 ex.dna-read.dna(exdna.txt,format=sequential,as.character=T)

This will return an object of class matrix instead of a DNAbin object. 
The elements in the matrix should just be the letters from your input file.


There may be a better way to do this, but if you already have the object 
in memory, you can write it to file and then read it back in using 
read.dna(...,as.character=T), as above.


- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 4/7/2011 11:16 AM, Josh B wrote:

Dear Listserve,

I have a seemingly simple problem that I cannot figure out. I would like to
convert an element of class DNAbin to a matrix.

Consider the example from the read.dna help file:

cat(3 40,
No305 NTTCGACACACCCACTACTNTTATCAGTCACT,
No304 ATTCGACACACCCACTACTATTATCAACCACT,
No306 ATTCGACACACCCACTACTATTATCAATCACT,
file = exdna.txt, sep = \n)
ex.dna- read.dna(exdna.txt, format = sequential)

This is where I am stuck: how do I create a matrix from ex.dna, where the
rownames are No305, No304, and No306, and the columns are the aligned
single-nucleotide sites. For example, Column 1 should have the value of N for
row No305, the value of A for row No304, and the value of A for row
No306, corresponding to the values of the first site in the sequence.

Thank you very much for your help!
---
Josh Banta, Ph.D
Center for Genomics and Systems Biology
New York University
100 Washington Square East
New York, NY 10003
Tel: (212) 998-8465
http://plantevolutionaryecology.org

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

[R-sig-phylo] Possible bug in bind.tree()

2011-04-19 Thread Liam J. Revell

This is mainly for Emmanuel, but I send to the entire list just in case 
someone else can identify my error first!


I believe I have identified a small bug in bind.tree().  For some 
reason, when I bind a tree with a root.edge to a tip, the length of the 
edge leading to that tip is attached not only to the root.edge of the 
bound subtree, but also to another tip of the tree.


If I have made some mistake here, my apologies(!), but any help in 
correcting it is also welcome of course.


The easiest way to illustrate this problem is via example:

# preliminaries

text=(1:3.785,(8:0.13,9:0.13):0.205,6:0.335):0.154,(10:0.278,7:0.278):0.211):0.1,(4:0.386,5:0.386):0.203):2.655,(2:1.08,3:1.08):2.164):0.54);
tree-read.tree(text=text)
node-19 # this is the node we will extract
position-1.0 # this is the length of the root.edge
# extract clade and attach root edge
tr1-extract.clade(tree,node=node); tr1$root.edge-1.0
# now remove tips in tr1 from tree
tr2-drop.tip(tree,tr1$tip.label,trim.internal=FALSE)
# subtract root.edge of tr1 from tip ending in NA
tr2$edge.length[match(which(tr2$tip.label==NA),tr2$edge[,2])]-tr2$edge.length[match(which(tr2$tip.label==NA),tr2$edge[,2])]-position
# now bind tr1 to tr2, where it was removed
tr.bound-bind.tree(tr2,tr1,where=which(tr2$tip.label==NA))
# now plot, to visualize the result
plot(tree); x11(); plot(tr.bound)

Thanks.  - Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] phylogenetically-informed Reduced Major Axis regression in R?

2011-04-20 Thread Liam J. Revell


Hi Arne,

Just calculating the slope is straightforward.  For tree and column 
vectors x  y (in order tree$tip.label):


 C-vcv.phylo(tree)
 ax-sum(solve(C,x))/sum(solve(C))
 ay-sum(solve(C,y))/sum(solve(C))
 beta1-sqrt(t(y-ay)%*%solve(C,y-ay)/(t(x-ax)%*%solve(C,x-ax)))

The model intercept can be obtained by plotting the slope through the 
phylogenetic mean of x  y.


 beta0-ay-beta1*ax

I wrote a function to do this; but also to (optionally) fit Pagel's 
lambda jointly for x  y and to return the residuals in y.  I did this 
last year, and then fixed a few bugs/errors when I saw your query this 
morning.  I have put it online here: 
http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/phy.RMA/v0.1/phyl.RMA.R.


Note, that this *does not* test hypotheses about the RMA regression nor 
does it incorporate individual variation (as in the Ives et al. 2007 
article).  Adding the former would be fairly simple using equations 15 
of Ives et al., I think (I will do this when I get a chance); adding the 
latter not so much.  I believe Tony Ives may have MATLAB code that does 
this already.  Maybe he can comment.


I hope this is somewhat helpful.

Sincerely, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 4/20/2011 2:27 AM, Arne Mooers wrote:

Dear members:

Does anyone know of scripts to both estimate and test 
phylogenetically-corrected RMA regression slopes (perhaps using the relevant 
equations from Ives et al. (Syst. Biol. 2007))?

Cheers,

Arne Mooers

__

Dr. Arne Mooers
Biological Sciences,
Simon Fraser University
Burnaby, BC Canada V5A 1S6
tel. +1 778 782 3979
skype: arnemooers
http://www.sfu.ca/~amooers
http://www.sfu.ca/fabstar
http://www.scientists-4-species.org

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Convert filogenetic tree to binary matrix

2011-04-28 Thread Liam J. Revell


Hi Vanderlei,

Try this which uses ape function prop.part():

temp-prop.part(tree)
X-matrix(0,nrow=length(tree$tip),ncol=length(temp),dimnames=list(tree$tip.label,tree$node.label))
for(i in 1:ncol(X)) X[temp[[i]],i]-1

- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 4/28/2011 2:21 PM, Vanderlei Debastiani wrote:

Good morning!

I need to create a binary matrix with all node of a phylogenetic tree and the 
presence of each taxo in their respective node.

Example:

require(ape)
y-read.tree(text=(E,((H,I)D,(F,G)C)B)A;)
y
plot(y, show.node=TRUE)

I need to create a binary matrix as follows:

AB  CD
G   1   1   1   0
F   1   1   1   0
I   1   1   0   1
H   1   1   0   1
E   1   0   0   0

Somebody could help me to solve this problem.

Thanks,

Vanderlei Debastiani
Laboratório de Ecologia Filogenética e Funcional
Programa de Pós-Graduação em Ecologia
Universidade Federal do Rio Grande do Sul
www.ufrgs.br/leff

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Generating a tree based on a distance matrix?

2011-05-12 Thread Liam J. Revell

You can get the least squares branch lengths. I believe that if the
distance matrix is a patristic distance matrix from your tree, then the
least squares branch lengths are guaranteed to be the same as your
original branches (and you should have a sum of squares error, Q, of
zero - to numerical precision).

To test this, you can just use my optim.phylo.ls() function, and set the
input tree to your target tree. You will need to have phangorn and
ape to run this:

source(http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/optim.phylo.ls/v0.3/optim.phylo.ls.R;)
# load the source

LStree-optim.phylo.ls(D,stree=tree)

Note that LStree will be unrooted even if tree is rooted.

To test this, try the following:

tree-rtree(5,rooted=F) # random unrooted tree
tree$edge.length # show edge lengths
[1] 0.005683385 0.516492136 0.761613776 0.714302898 0.071461088
0.217680734 0.393926894

D-cophenetic(tree) # compute distance matrix
tree$edge.length-NULL # delete branch lengths
test-optim.phylo.ls(D,tree) # compute LS tree with true tree as input
best Q score of 4.19082355898663e-30 found after 0 nearest neighbor
interchange(s).

test$edge.length
[,1]
6,7 0.005683385
7,1 0.516492136
7,8 0.761613776
8,2 0.714302898
8,3 0.071461088
6,4 0.217680734
6,5 0.393926894

You can easily see that they are the same branch lengths as in our
original tree.

I hope this is helpful.

Sincerely, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 5/12/2011 5:18 PM, mgavil2 wrote:

All,
I have a tree topology (tree_name.tre), and a distance matrix, based
on that tree topology. However I cant not seem to find the nexus file
from which the matrix was generated. Is there a way to use that
distance matrix to incorporate branch lengths into my topology?
I have looked into all the threads of questions posted in the list,
but still can not find an answer. my final objective is just to
generate a tree with branch lengths proportional to the distances on
the matrix (reviewers requirement for a publication).

Any suggestions would be greatly appreciated!

Thanks.

Maria Mercedes

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] ACE - ML

2011-05-17 Thread Liam J. Revell


Sorry, I didn't see that this had already been addressed.

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 5/17/2011 12:39 PM, David Bapst wrote:

Alanna-
It's because when multi2di() resolves the polytomies, it puts in zero-length
edges. This causes the phylogenetic variance-covariance matrix calculated
from your tree to be singular.

One solution would be to add a small arbitrary constant to your zero-length
edges, although I would caution you to try several different values and see
how your results differ.
-Dave Bapst, UChicago


On Tue, May 17, 2011 at 10:23 AM, Alanna Maltbyalanna.mal...@ioz.ac.ukwrote:


Dear all



I've been trying to use Maximum Likelihood to estimate ancestral
characters using ace in APE, but I get an error:




   ?Error in solve.default(out$hessian) :



   Lapack routine dgesv: system is exactly singular?




My code looks like this:



tree-read.nexus(tree.nex)

data-read.csv(data.csv)

BW-data.frame(data[,c(2,3)])

rownames(BW)-data[,2]

na.omit(BW)-BW

name.check(tree,BW)-overlap

drop.tip(tree,overlap$Tree.not.data)-tree2

tree3-multi2di(tree2)

BW2-BW[tree3$tip.label,]

BW3-data.frame(BW2[,2])

as.matrix(BW3)-BWM

as.vector(BWM)-BWV

names(BWV)-BW2[,1]

asrBW-ace(BWV, tree3, type=continuous)

asrBW$ace



which is possibly not super-elegant, but works when I use pic instead of
ML.



I notice that someone brought up the same problem last year but there
were no solutions. Any ideas?



Many thanks,



Alanna

-

Alanna Maltby

PhD Student - Evolution of Echolocation in Bats

University College London and Institute of Zoology

Tel: +44 (0) 20 7449 6322

Website: www.zsl.org/alannamaltbyhttp://www.zsl.org/alannamaltby





The Zoological Society of London is incorporated by Royal Charter
Principal Office England. Company Number RC000749
Registered address:
Regent's Park, London, England NW1 4RY
Registered Charity in England and Wales no. 208728

_
This e-mail has been sent in confidence to the named a...{{dropped:21}}


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] ACE - ML

2011-05-17 Thread Liam J. Revell


Hi Alanna,

You can get this kind of error if your tree has tip edges with zero 
length.  Then C-vcv.phylo(tree) will be exactly singular.  Could this 
be true of your tree?


Best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 5/17/2011 11:23 AM, Alanna Maltby wrote:

Dear all



I've been trying to use Maximum Likelihood to estimate ancestral
characters using ace in APE, but I get an error:




   ?Error in solve.default(out$hessian) :



   Lapack routine dgesv: system is exactly singular?




My code looks like this:



tree-read.nexus(tree.nex)

data-read.csv(data.csv)

BW-data.frame(data[,c(2,3)])

rownames(BW)-data[,2]

na.omit(BW)-BW

name.check(tree,BW)-overlap

drop.tip(tree,overlap$Tree.not.data)-tree2

tree3-multi2di(tree2)

BW2-BW[tree3$tip.label,]

BW3-data.frame(BW2[,2])

as.matrix(BW3)-BWM

as.vector(BWM)-BWV

names(BWV)-BW2[,1]

asrBW-ace(BWV, tree3, type=continuous)

asrBW$ace



which is possibly not super-elegant, but works when I use pic instead of
ML.



I notice that someone brought up the same problem last year but there
were no solutions. Any ideas?



Many thanks,



Alanna

-

Alanna Maltby

PhD Student - Evolution of Echolocation in Bats

University College London and Institute of Zoology

Tel: +44 (0) 20 7449 6322

Website: www.zsl.org/alannamaltbyhttp://www.zsl.org/alannamaltby





The Zoological Society of London is incorporated by Royal Charter
Principal Office England. Company Number RC000749
Registered address:
Regent's Park, London, England NW1 4RY
Registered Charity in England and Wales no. 208728

_
This e-mail has been sent in confidence to the named add...{{dropped:20}}

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] fitContinuous in geiger

2011-05-18 Thread Liam J. Revell


Hi Annemarie,

Positive log-likelihoods are not a problem.  The log-likelihood is 
calculated by summing the log probability densities, which come from a 
function that integrates to 1.0.  Thus, if the variance of this 
distribution is small, the value of the function will be large (i.e., 
greater than 1.0).


The phenomenon of decreasing mean lambda when you increase the scale 
(i.e., multiply by 10 or 100) is probably due to bounds on beta (in the 
lambda model, sigma^2) in fitContinuous().  The default upper bound is 
20.  You can change this by executing:


 fitContinuous(...,bounds=list(beta=c(0,1000)) # or something

once you fix this issue, the mean lambda for any scale of your data 
vector should be the same.


You can also try my function for estimating phylogenetic signal: 
phylosig(...,method=lambda) at URL: 
http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/ which does not have 
this issue.


Good luck.

- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 5/18/2011 3:50 AM, Annemarie Verkerk wrote:

Hi all,

I’m having some trouble with the function fitContinuous in the geiger
library. I'm using fitContinuous to estimate a lambda score as an
indication for the presence of phylogenetic signal. As a sidenote, I'm
doing this with language data - so language trees based on shared
vocabulary and it is a linguistic typological trait that I'm trying to
get estimates of lambda of. Another sidenote is that I have similar
problems in BayesTraits but no problems using phylosignal in picante for
estimating lambda.

At the moment, there are 14 taxa in my sample. I have a tree set of 1000
trees. The first data set + trees are attached. My data values are all
values between 0 and 1, basically things like '0.326547'. (This is
because they come from a principal components analysis; they are scores
on the first principal component that explains about 80% of the
variation.) I've been using capped values with two numbers after the
period just for easy usage, so '0.33'. However, the results that I get
are strange.

My first dataset looks like this:

language value
t1 0.32
t4 0.52
t6 0.95
t9 0.75
t10 0.77
t12 0.46
t14 0.61
t2 0.35
t3 0.29
t5 0.25
t7 0.89
t8 0.88
t11 0.79
t13 0.35

Then I do the fitContinuous analysis over my sample of trees (1000
trees) and these are my scores:

median of lambda:
[1] 1
mean of lambda:
[1] 0.985
sd of lambda
[1] 4.60849e-05

So: almost all values of lambda are 1.

median of log-likelyhood
[1] 5.206887
mean of log-likelyhood
[1] 5.210839
sd of log-likelyhood
[1] 0.4215943

The log-likelyhood is positive? That is very strange…? These results
basically make it seem as if the algorithm has crashed.

Then, I multiply my values with 100:

language value
t1 32
t4 52
t6 95
t9 75
t10 77
t12 46
t14 61
t2 35
t3 29
t5 25
t7 89
t8 88
t11 79
t13 35

results:

median lambda:
[1] 0.9874361
mean lambda:
[1] 0.9839095
sd lambda:
[1] 0.01622255

median log-likelihood:
[1] -65.66331
mean log-likelihood:
[1] -65.73675
sd log-likelihood:
[1] 1.716778

Now the number of lambda scores of '1' is lower, although it is not
really gone yet, there are still around a 200-300 instances of '1'. The
log-likelyhood is now -65, so at least it's negative.

When I multiply my original data points with 1000, this is my data set:

value
language value
t1 320
t4 520
t6 950
t9 750
t10 770
t12 460
t14 610
t2 350
t3 290
t5 250
t7 890
t8 880
t11 790
t13 350

results:

median lambda:
[1] 0.8640076
mean lambda:
[1] 0.8561964
sd lambda:
[1] 0.05001523

median log-likelihood:
[1] -2055.763
mean log-likelihood
[1] -2067.052
sd log-likelihood
[1] 213.44

There are no no more lambda scores of ‘1’ in the data, but the log
likelood is a really big number, and I'm not sure what that would mean
in this context?

So, even though the range of variation stays exactly the same with these
multiplications, there are quite important differences between the
results these three sets of data give me. It was suggested to me that
the algorithm might be doing something to my data values, for instance
cap them, round them off or not taking into account certain decimals,
and that might be the reason for these different results. Would anyone
have any idea about why this happens and how I can deal with it in an
informative way?

Thanks so much for any help that you might be able to offer,
Annemarie Verkerk




___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] fitContinuous in geiger

2011-05-23 Thread Liam J. Revell


Hi Annemarie,

The only thing I would add to Carl's comment is that the theoretical 
limit of lambda is not 1.0, but can be found (for an ultrametric tree) 
by computing:


 C-vcv.phylo(tree)
 maxLambda-max(C)/max(C[upper.tri(C)])

You can then change the boundary condition for fitContinuous():

 fit-fitContinuous(phy=tree,x,model=lambda,bounds= 
list(alpha=c(0,maxLambda)))


My function phylosig() automatically finds the upper boundary condition 
and uses this for the optimization:


 
source(http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/phylosig/v0.3/phylosig.R;)

 fit-phylosig(tree,x,method=lambda,test=TRUE)

Best of luck.

- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 5/23/2011 2:46 PM, Carl Boettiger wrote:

Hi Annemarie,

No problem, tried to give some answers below.


On Mon, May 23, 2011 at 8:05 AM, Annemarie Verkerk
annemarie.verk...@mpi.nl mailto:annemarie.verk...@mpi.nl wrote:

Dear Carl, Liam, and others,

thanks for your explanation of what went wrong in the fitContinuous
calculations. I set beta to a large number (100) in order to
stop it from reading the maximum value. Then, I got exactly the same
results for lambda with the non-multiplied and the multiplied data.

Okay, I still have two more problems - probably they will sound
stupid to you, but I am still very much a newbie to this and if it
suffices just to point me to other sources where this has already
been discussed please do so!

the log likelyhood: between the non-multiplied and the multiplied
data, there is still a difference. Liam, you write 'if the variance
of this distribution is small, the value of the function will be
large (i.e., greater than 1.0)'. However, the variance between the
non-multiplied and the multiplied data is exactly the same. So why
should log likelyhood values change when data is multiplied? Why do
I get a normal looking value (around -200) (with the beta scale set
for a very large scale) for the multiplied data? And why does the
log likelyhood become so big (-2000) when the initial maximum beta
value of 20 is reached if the scale is (0,20)?


Liam is referring to the variance of the likelihood distribution, not
the variance of the traits.  If the diversification rate is high, then
any particular outcome is very improbable, so its probability density
has high variance and corresponding low value for the prob density at
any particular value.  Hence the large negative log-likelihood for large
beta.  (Recall log lik around -2000 means a density of exp(-2000), which
is very near zero, illustrating why we need to use logs!)


the lambda scores: now, my lambda scores for the non-multiplied and
the multiplied data are the same. However, most of them (999 out of
1000) are '1'. To my understanding, '0' and '1' are theoretical
limits for lambda that one normally does not reach. So I'm afraid
I'm still unsure why this happens, and what it means.


You raise an important point here.  Just because they are the boundary
values, does not imply that these theoretical limits are uncommon -- in
fact one may expect to hit the limits more often than any other value.
Numerical optimization will often push estimates of a parameter all the
way to its boundary.  This just means that increasing (deceasing) the
parameter value increases the likelihood, so it keeps doing so until it
can go no further.  This can result from, but does not necessarily
imply, that the model is inappropriate (it is also particularly common
for small numbers of taxa, for instance), and can be more common on
relatively flat likelihood surfaces.

So I guess my short answer is this isn't a problem and my longer
answer is be suspicious whenever you get back boundary estimates, and
consider bootstrapping.


HTH,

Carl

I hope you don't mind this new response to the thread.

With kind regards,
Annemarie


Liam J. Revell wrote:

Hi Annemarie,

Positive log-likelihoods are not a problem.  The log-likelihood
is calculated by summing the log probability densities, which
come from a function that integrates to 1.0.  Thus, if the
variance of this distribution is small, the value of the
function will be large (i.e., greater than 1.0).

The phenomenon of decreasing mean lambda when you increase the
scale (i.e., multiply by 10 or 100) is probably due to bounds on
beta (in the lambda model, sigma^2) in fitContinuous().  The
default upper bound is 20.  You can change this by executing:

  fitContinuous(...,bounds=list(beta=c(0,1000)) # or something

once you fix this issue, the mean lambda for any scale of your
data vector should be the same.

You can also try my function for estimating phylogenetic signal

Re: [R-sig-phylo] ancestral state reconstruction with fixed internal node(s)

2011-06-30 Thread Liam J. Revell


Hi Annemarie,

I do think this makes sense - *if* (and this may be a big if, 
depending on your data) we have a value for a hypothetical ancestor for 
which we are very confident and which we are sure belongs at a 
particularly ancestral node.


In that case, you could indeed use ace() by assigning attaching a zero 
length tip edge to the internal node of interest.


To illustrate this using an example, let's do the following:

# first generate a pure-birth tree using birthdeath.tree in geiger
tree-drop.tip(birthdeath.tree(b=1,d=0,taxa.stop=21),21)

# now plot the tree with internal node numbers
plot(tree); nodelabels(length(tree$tip)+1:tree$Nnode)

# now let's simulate on the tree using fastBM() from my phytools package
# (not on CRAN, but http://anolis.oeb.harvard.edu/~liam/R-phylogenetics)
# in this case we know the ancestral states if we set internal=TRUE
x-fastBM(tree,internal=T)
y-x[1:length(tree$tip)] # only the tip states

# now, let's say we want to attach the tip to node 22 in this case
tip-list(edge=matrix(c(2L,1L),1,2),tip.label=22,edge.length=0,Nnode=1L)
class(tip)-phylo
atree-multi2di(bind.tree(tree,tip,where=22))
z-c(y,x[22])

# now let's estimate ancestral states - we can do this using
# method=gls
res.gls-ace(z,atree,method=GLS,corStruct=corBrownian(1,atree))

Keep in mind that the node numbers in tree and atree are different 
(because the latter has a different number of tips) *and* we have a 
different number of them (because we used multi2di() to fully resolve 
our multifurcating tree after we attached the extra tip).


Hope this helps.

- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 6/30/2011 12:40 PM, ppi...@uniroma3.it wrote:

Hi AnneMarie,
I dont really know if this makes sense; in fact
ancestral state reconstruction is an ** a posteriori
estimation ** of nodal values starting from tips
observations.

A trick could be to add a false taxon lnked to that
node  and giving to it a 0 branch length (i.e.
plitomized - you can then resolve this during
computation using multi2di() ) and assigning your
desired trait value.

I'm not sure if this helps

Best
Paolo






Dear phylo-sig list people,

I want to do ancestral state reconstruction
(preferably with ace) with
one (or more) of the internal nodes 'fixed' for a
range / a distribution
of values. For instance, I want a node leading to one
particular clade
that is present a subset of my trees to have a value
from 0.2-0.5, and
then do the ancestral state reconstruction with this
restriction on that
node.

I have been searching the archives and the net for
discussion of this
(but maybe with the wrong search terms), and I cannot
find anything
about it - not how to do it in R or in any other
package.

Thanks for your input!
Annemarie

--
Annemarie Verkerk, MA
Evolutionary Processes in Language and Culture (PhD
student)
Max Planck Institute for Psycholinguistics
P.O. Box 310, 6500AH Nijmegen, The Netherlands
+31 (0)24 3521 185
http://www.mpi.nl/research/research-projects/evolutionary-processes

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] a question about pic function(ape package)

2011-07-11 Thread Liam J. Revell


Hi.  This is because your tree is not fully dichotomous.

 tree-read.nexus(file=tree.nex)
 is.binary.tree(tree)
[1] FALSE

You can resolve multifurcations with branches of zero length using 
multi2di(), e.g.:


 tree-multi2di(tree)
 is.binary.tree(tree)
[1] TRUE

Hope this helps.  - Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 7/11/2011 10:44 PM,  wrote:

dear all,
I have a question when I used the pic funtion, it appeared an
error:'phy' is not rooted and fully dichotomous, I don't understand
what's the problem, the attached file is my phylogenetic information,
please check it, I am sorry to trouble you all, but I really want to
solve this problem in my study, thanks a lot.
best wishes to you all!




___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] pic() vs gls()

2011-07-13 Thread Liam J. Revell


Hi Nick.

For your first (simple) problem, I believe you want to do:

 read.csv(file=Mason_data.csv,row.names=1)-smdata

Regarding the more complicated issue, the problem of non-independence in 
linear regression comes in the residual error of the model.  Thus, you 
should fit a correlation structure for the error in Y given X (not for X 
 Y separately as you have done with PICs here).  This indeed can be 
done using gls() - so, for instance:


pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1, 
spbm),data=smdata)


fits a linear model in which the residual error of Pause_Rate given 
Comp.1 is distributed according to the correlation structure 
corMartins() with alpha estimated using REML.


The way to reproduce this result using contrasts is the following:

 alpha-attr(pglsModel1a$apVar,Pars)[corStruct] # get fitted alpha
 smdata-as.matrix(smdata) # important
 pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=alpha))
 pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=alpha))
 summary(lm(pr_contrast~pc1_contrast-1))

Which if you compare to:

 summary(pglsModel1a)

you should find yields the same results and P-value.  (Note that 
smdata-as.matrix(smdata) seems to be important here as if smdata is a 
data frame, then smdata[,1] does not inherit the rownames of smdata and 
pic() will return an incorrect set of contrasts.)


I hope this helps.  No doubt other subscribers to the list may have 
something to add.


Best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 7/13/2011 11:07 PM, Nicholas Mason wrote:

Dear R-sig-phylo users,
I have a question regarding comparative analyses of contrasts done with
the functions fitContinuous() and pic() compared to using PGLS (using
the gls() function).

 From my understanding the first method involving pic() below fits alpha
(estimated using fitContinuous()) to each character independently then
performs a regression on the resulting contrasts.

The second (PGLS) method involving gls() fits both the regression and
contrasts simultaneously and reports a single alpha value for the
relationship between two traits.

These two methods appear very similar, yet I get slightly different
results between the two functions, particularly with respect to
p-values. Using fitContinuous(), the results from the data set attached
are r2 = 0.075 p = 0.091. Using gls(): r2 = 0.079 p =0.0519.
Furthermore, if I switch the dependent and independent variables (V1~V2
vs V2~V1), I get the same answer with pic(), but gls() differs: r2 =
-0.069 p =0.02 (see pglsModel1a vs pglsModel1b in the attachment).

Could anyone explain why these functions vary (in my mind they were
doing essentially the same thing) and if there are situations where one
should be favored over another? Also, why does PGLS vary if you switch
the dependent/independent variables in the linear model?

Thanks in advance for any advice or comments offered!!

Cheers,
Nick Mason

Below I have the code I have been using (also attached as Mason.R):

require(ape)
require(nlme)
require(geiger)
read.csv(file=Mason_data.csv)-smdata
rownames(smdata)-smdata[,1]
smdata[,1]-NULL#ASIDE: If anyone could tell me how to get around these
two lines of code, that would also be awesome. Header=T doesn't seem to
work for me.
read.tree(file=Mason_tree.tre)-spbm
name.check(smdata,spbm)

fitContinuous(spbm,smdata,model=OU)-ou2

pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=ou2$Pause_Rate$alpha))
pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=ou2$Comp.1$alpha))

summary(lm(pr_contrast~pc1_contrast-1))

pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1,
spbm),data=smdata)
summary(pglsModel1a)

pglsModel1b-gls(Comp.1~Pause_Rate, correlation=corMartins(1,
spbm),data=smdata)
summary(pglsModel1b)








-
Nicholas Albert Mason
MS Candidate, Burns Lab
Department of Biology - EB Program
San Diego State University
5500 Campanile Drive
San Diego, CA 92182-4614
845-240-0649 (cell)





___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] follow up for lambda transform - query: how to use an existing covariance matrix directly in a gls procedure in R?

2011-07-15 Thread Liam J. Revell


Hi Luke.

For a fixed value of lambda this is very easy.  For this example, you 
would just do:


library(ape)
data(bird.orders)
dat-data.frame(y=rnorm(23),x=rnorm(23))
rownames(dat)-bird.orders$tip.label
mat-vcv(bird.orders,cor=TRUE)
# following Blomberg to here
lambda-0.75 # for instance
fit1-gls(y~x,correlation=corSymm(lambda*mat[lower.tri(mat)],fixed=TRUE),data=dat)
# compare to corPagel
fit2-gls(y~x,correlation=corPagel(0.75,bird.orders,fixed=TRUE),data=dat)
summary(fit1)
summary(fit2)

In addition, a trick to get the full correlation/covariance matrix after 
lambda transformation (given a set value of lambda) is just:


mat.lambda-lambda*(mat-diag(diag(mat)))+diag(diag(mat))

I'm not sure whether or not this is exactly what you're looking for, but 
I hope it helps.


Best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 7/15/2011 4:52 PM, Luke Matthews wrote:

This question picks up on a thread from last week, copied below.  I've also 
been able to use the approach Simon presented so concisely with gene networks 
that are not strictly bifurcating.  This only allows for the Brownian model; 
that is, no transforms of the entries supplied in the covariance matrix.
I'd like to apply a lambda-type transform to the covariance or correlation 
matrix directly.  Does anyone know how to co-opt corPagel for use directly with 
the covariance matrix, or else an alternative implementation in an nlme 
correlation structure?
Thanks
Luke Matthews

[R-sig-phylo] query: how to use an existing covariance matrix directly in a gls 
procedure in R?
Simon Blomberg s.blomberg1 at 
uq.edu.aumailto:r-sig-phylo%40r-project.org?Subject=Re%3A%20%5BR-sig-phylo%5D%20query%3A%20how%20to%20use%20an%20existing%20covariance%20matrix%0A%20directly%20in%20a%20gls%20procedure%20in%20R%3FIn-Reply-To=%3C011EDAAAEA824D44AE3EC8D2562AB98402DDFE279F%40UQEXMB06.soe.uq.edu.au%3E
Fri Jul 8 01:44:48 CEST 2011

  *   Previous message: [R-sig-phylo] query: how to use an existing covariance matrix 
directly in a gls procedure in 
R?https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/001474.html
  *   Next message: [R-sig-phylo] query: how to use an existing covariance matrix 
directly in a gls procedure in 
R?https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/001477.html
  *   Messages sorted by: [ date 
]https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/date.html#1475  [ thread 
]https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/thread.html#1475  [ subject 
]https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/subject.html#1475  [ author 
]https://stat.ethz.ch/pipermail/r-sig-phylo/2011-July/author.html#1475



I used to do this before ape existed.



Here is some example code.



library(ape)

data(bird.orders)



# some made up data

dat- data.frame(y=rnorm(23), x=rnorm(23))

rownames(dat)- bird.orders$tip.label



mat- vcv(bird.orders, cor=TRUE)

fit- gls(y~x, correlation=corSymm(mat[lower.tri(mat)], fixed=TRUE), data=dat) 
#assuming ultrametric tree

# compare with corBrownian structure:

  fit2- gls(y~x, correlation=corBrownian(phy=bird.orders), data=dat)



# are the regression coefficients the same?



  all.equal(coef(fit), coef(fit2))

[1] TRUE







I hope this helps,



Simon.



From: r-sig-phylo-bounces at 
r-project.orghttps://stat.ethz.ch/mailman/listinfo/r-sig-phylo  [r-sig-phylo-bounces at 
r-project.orghttps://stat.ethz.ch/mailman/listinfo/r-sig-phylo] On Behalf Of Tomlinson, 
Kyle [kyle.tomlinson at wur.nlhttps://stat.ethz.ch/mailman/listinfo/r-sig-phylo]

Sent: Friday, July 08, 2011 5:30 AM

To: 'r-sig-phylo at 
r-project.orghttps://stat.ethz.ch/mailman/listinfo/r-sig-phylo'

Subject: [R-sig-phylo] query: how to use an existing covariance matrix directly 
in a gls procedure in R?



Dear Dr Paradis



I trust that I am allowed to contact you directly  like this? If not, my 
sincere apologies.



I have constructed the covariance matrix for a phylogenetic tree (using the 
vcv() command), which i would like to use directly in a gls() procedure, 
instead of using the tree directly as appears to be done in ape (because I only 
use a very small part of the tree i have constructed..).

{I am doing this in order to check my own gls procedure and the gls procedure 
of PHYLOGr which I dont trust.}



Do you know if this can be done? I have tried to figure it out by considering 
the corStruct class options in nlme, but none of these options seems to allow 
one to directly input an existing covariance matrix, and I simply dont 
understand the object construction methods of R well enough to build a suitable 
corStruct object myself.



I hope you can help.





sincerely



Kyle Tomlinson



Resource Ecology Group

Centre for Ecosystem Studies

Wageningen University

Droevendaalsesteeg 3a

6708 PB Wageningen

The Netherlands



Phone: +31 317

Re: [R-sig-phylo] LTT plot non-ultrametric trees

2011-08-10 Thread Liam J. Revell

This is implemented in my phytools package.  This is not yet on CRAN, 
but can be downloaded from 
http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/.  The function is 
ltt().  It is slow, but seems to work.  Please let me know if you have 
success with this.


- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 8/10/2011 8:51 PM, Rob Lanfear wrote:

Hi All,

I have a set of simulated birth-death trees, which include extinct and
extant lineages. I'd like to see if my simulations are doing what I think
they should be, by plotting out the number of lineages through time.

Does anyone know of a method of doing this that incorporates extinct
lineages, i.e. one where the number of lineages can decrease as well as
increase through time.

I can see how it could be done (I think...) by making an ordered data frame
of node times (+1 lineage) and extinction events (-1 lineage), but it seems
like somebody might already have done it.

Cheers,

Rob



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] LTT plot non-ultrametric trees

2011-08-11 Thread Liam J. Revell


Hi Rob.

Thanks for identifying the bug.  This should be fixed, I hope, in the 
newest version of phytools (v0.0-6; available: 
http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/).


The way this function works is by first computing the heights above the 
root of all the nodes (including tip nodes) in the tree, and then at 
each event (that is, a speciation or extinction), it counts the number 
of edges that include that time point.  This is slow.  It seems likely 
that Emmanuel's suggestion (in a previous email) to use dist.nodes() 
would run much faster than this.


- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 8/11/2011 12:42 AM, Rob Lanfear wrote:

Hi Liam,

Thanks for the help. Extremely useful.

And thanks for clearing up my lack of understanding about the
differences in maximum ages!

Cheers,

Rob

On 11 August 2011 14:05, Liam J. Revell liam.rev...@umb.edu
mailto:liam.rev...@umb.edu wrote:

Hi Rob.

I can reproduce your error, but I haven't figured out the problem yet.

You can try an earlier version of this function, which seems to work:


source(http://anolis.oeb.__harvard.edu/~liam/R-__phylogenetics/ltt/v0.3/ltt.R
http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/ltt/v0.3/ltt.R)

p2 - ltt(t1, log.lineages=FALSE, drop.extinct=FALSE)

Sorry about this.

Also:


max(p1$times)==max(p2$times)

can be FALSE because if drop.extinct is set to TRUE, then the crown
age of the pruned tree can be smaller than in the full tree if some
lineages arising at the root of the tree do not leave any extant
descendants.


- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.__revell/
http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu mailto:liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 8/10/2011 9:50 PM, Rob Lanfear wrote:

library(TreeSim)

library(phytools)

library(geiger)


#simulate tree of 100 taxa with initial diversification followed
by a
period of B=D

t1 - sim.rateshift.taxa(100, 1, c(0.2, 0.2), c(0.2, 0.05),
c(1,1), c(0,
20))[[1]]

t1


#confirm number of extant taxa is 100

n.extant- length(prune.extinct.taxa(t1)$__tip.label)

n.extant


#do ltt plot without extinct taxa (works fine)

p1- ltt(t1, log.lineages=FALSE, drop.extinct=TRUE)


#do ltt plot with extinct taxa (looks odd)

p2- ltt(t1, log.lineages=FALSE, drop.extinct=FALSE)


#max times don't seem to correspond between the two plots.

max(p1$times)==max(p2$times)





--
Rob Lanfear
Postdoc,
Centre for Macroevolution and Macroecology,
Research School of Biology,
Australian National University

Tel: +61 2 6125 7270
www.robertlanfear.com http://www.robertlanfear.com


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Interpreting rate.evol.mcmc output

2011-08-12 Thread Liam J. Revell


Hi Roland.

Thanks for your interest in this function and method.  The method is 
described in an article that is now available as an Accepted Article 
online (http://dx.doi.org/10./j.1558-5646.2011.01435.x). 
Unfortunately, this was posted online by 'Evolution' without its 
supplementary appendix (which contains details on running and 
interpreting the function evol.rate.mcmc()), but I would be happy to 
send this to you off-list.


evol.rate.mcmc() implements a method in which MCMC is used to identify 
the phylogenetic position of a shift in the evolutionary rate for a 
continuously valued character.  It can also be used to obtain estimates 
of the evolutionary rates tipward and rootward of this shift.


What you are looking at is the raw posterior sample from this 
analysis.  Depending on how long you run the MCMC you should have 100s 
or 1000s of rows in $mcmc  100s or 1000s of elements in $tips.  To make 
sense of this, you should use two different functions: minSplit() and 
posterior.evolrate().


minSplit() takes the tree and $mcmc as input.  You should first decide 
how many generations (and thus how many rows of $mcmc, which will depend 
on sampling frequency) you want to exclude as burn-in.  It returns the 
shift-point that minimizes the summed or the sum of squared distances to 
all the other sampled points in the posterior.


posterior.evolrate() pre-processes the posterior sample so that you can 
make sense of it.  It takes as input the tree, the average shift point 
(say, the output from minSplit()), $mcmc, and $tips.  It returns a 
matrix containing a pre-processed posterior sample of evolutionary rates 
tipward and rootward of the median shift point which can then be 
analyzed in a typical way.  (For instance, you can compute parameter 
estimates as the mean or median; as well as HPD intervals and ESSs 
using, for instance, the MCMC diagnostics package coda.)


I hope this is helpful.  Thanks again for giving the method a try. 
Please don't hesitate to contact me for further clarification.


Best of luck.

- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 8/12/2011 11:07 AM, Roland Sookias wrote:

Hi

Can anyone help me interpret the output from phytools' rate.evol.mcmc
function? I get, as it says I should, results from the MCMC run and
a tips list of stips in state sig(1)^2 for each sampled generation of
MCMC (to polarize the rate shift)., but I'm not sure what I'm looking
for practically to see if/when there have been rate shifts.

The mcmc list is like this:


2 200 0.004979113 0.002156864 1.889112  131  3.6386714  -38.68685
3 300 0.008243820 0.006182640 1.964227   97 19.7978315  -36.52512

And the tips like this:


$tips[[5]]
[1] Effigia_okeeffeae

Cheers

Roland

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Blomberg's K and assumptions regarding independent contrasts

2011-08-22 Thread Liam J. Revell

Yes, this would be circular. For instance, consider the situation in 
which the data have no signal and thus the best branch-length 
transformation (perhaps) would shorten all internal edges to 0, and 
lengthen all terminal edges to a constant.  K is guaranteed to be 1.0 
(regardless of the data) on this tree (and any star tree).


I hope this is helpful.  Sorry for the slow reply.

All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 8/12/2011 1:35 AM, Manabu Sakamoto wrote:

Hi everyone,

When performing Blomberg et al.'s test for phylogenetic signals, do the data 
and/or branch lengths have to conform to the assumptions of Brownian motion as 
discussed in Garland et al. (1992) and elsewhere, i.e. do contrasts have to be 
adequately standardised? Or can data and branch lengths be the original 
untransformed values?

I have the vague sense that it would be circular to make sure the data/branch 
lengths conform to BM and then test for deviations from BM by computing K. Or 
am I wrong to think this way?

many thanks,
Manabu

Manabu Sakamoto, PhD
Postdoctoral Research Associate
School of Earth Sciences
University of Bristol
Bristol, UK, BS8 1RJ

Tel: +44 (0) 117 954 5421
Fax: +44 (0)117 925 3385
Email: m.sakam...@bristol.ac.uk

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Maximum Parsimony branch lengths

2011-09-02 Thread Liam J. Revell


Hi Robin.

Yes, you can do this under the ACCTRAN (accelerated transformation) 
criterion using the function acctran() in phangorn.  To do this given 
a data file sequences.dna in FASTA format:


 dna-read.phyDat(sequences.dna,format=fasta)
 tree-pratchet(dna) # or you could use optim.parsimony()
 tree-acctran(tree,dna)

I hope this helps.

- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 9/2/2011 8:35 AM, Velzen, Robin van wrote:

Dear R-sig phylo list members,

I am trying to apply maximum Parsimony tree searches on a large number of 
simulated datasets within the R environment. So far I have found one function 
implementing Parsimony: the optim.parsimony function in the package phangorn. 
This function finds a shortest tree for a typical dataset (1000 taxa) 
reasonably fast which is great. However, the resulting tree does not include 
branch lengths (i.e. it is a fully bifurcating cladogram). This is problematic 
for me because there are nodes separating identical sequences.

Is there a way (i.e. function) to calculate branch lengths in terms of number 
of substitutions for a tree?
..Or does someone know an alternative function to get a maximum Parsinomy tree 
with branch lengths?

Any help or suggestion would be most welcome.

Thanks!

Robin


Robin van Velzen
Biosystematics Group
Wageningen University




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] edge representation of trees produced by birthdeath.tree() in geiger

2011-09-09 Thread Liam J. Revell


Hi Andrew.

This is because the numbers in tre$edge are node  tip numbers, not tip 
labels (which also can be numbers).  The translation between tip numbers 
and tip labels is given by tre$tip.label.  Here, tip labels are indexed 
by their numbers in tre$edge.


For instance, in your case we have:

 tre$edge
  [,1] [,2]
 [1,]   10   11
 [2,]   11   12
 [3,]   12   13
 [4,]   131
 [5,]   132
... (some rows excluded)

and:

 tre$tip.label
[1] 8 9 3 1 6 7 2 4 5

This tells us, to take your example, that node 13 should be connected to 
tips 1  2, which tre$tip.label tells us have labels 8  9, just as 
we see with:


 plot(tre); nodelabels()

I hope this clears things up somewhat.

All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 9/9/2011 7:51 AM, Andrew Barr wrote:

library(ape)
tre-read.tree(text=8:2.850732306,9:0.1789410068):1.147884602,3:0.8848821048):1.411242805,(1:3.869881809,((6:0.6904865854,7:0.02044458561):1.984259398,2:0.04497273798):0.3022512176):0.09298177296):0.3029547699,(4:0.8015431218,5:1.467916249):2.117025776);)


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] post-order tree traversal

2011-09-14 Thread Liam J. Revell


Hi Nick.

If you use the function reorder.phylo(...,order=pruningwise) the 
resultant tree edge matrix is suitable for post-order traversal (i.e., 
the daughters always precede the parents in top-to-bottom matrix 
traversal).  This might be helpful for what you would like to do - for 
instance if you just label the nodes as they are first encountered in 
the matrix this would be in the order of a post-order traversal of the tree.


All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 9/14/2011 9:23 PM, Nick Matzke wrote:

Hi all,

To display the node reconstructions from another program in APE, I need
to label my internal nodes as they would be labeled in a post-order tree
traversal.

Is there an easy way/function to do this somewhere, or do I have to
write my own tree-traversing functions?

Cheers,
Nick




___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Simulating binomial trait shifts on a phylogeny

2011-10-11 Thread Liam J. Revell


Hi Antigoni.

I have a function in the new version of phytools called sim.history() 
that might help with this.  This version is not available yet on CRAN, 
but it can be downloaded from my webpage: 
http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/.  The function 
generates stochastic histories according to some rate matrix, and stores 
the histories along the states at all terminal nodes.


This code might be helpful in your task.  It simulates random trees, for 
a given number of changes it computes the instantaneous rate matrix that 
will produce that number of changes, and then it simulates (repeatedly) 
under that rate until the desired number of changes is obtained.  The 
code fragment uses functions in ape, geiger, and phytools:


nchanges-5 # desired number of changes
ntrees-100 # desired number of trees
ntaxa-100 # desired number of taxa
require(phytools); require(geiger) # required packages
mtrees-list()
for(i in 1:ntrees){
# simulate a stochastic B-D tree using birthdeath.tree

tree-drop.tip(birthdeath.tree(b=1,d=0,taxa.stop=(ntaxa+1)),as.character(ntaxa+1))
# compute the rate that results in (on average) nchanges changes
q-nchanges/sum(tree$edge.length)
# simulate a stochastic history
mtree-sim.history(tree,Q=matrix(c(-q,q,q,-q),2,2))
# loop until desired number of changes
while((length(unlist(mtree$maps))-nrow(mtree$edge))!=nchanges)
mtree-sim.history(tree,Q=matrix(c(-q,q,q,-q),2,2))
mtrees[[i]]-mtree
# for fun, plot the history
plotSimmap(mtrees[[i]])
}
class(mtrees)-multiPhylo
# tip states for the kth tree are in mtrees[[k]]$states

Please let us know if this is useful or if you have any questions or 
difficulties implementing.


All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 10/11/2011 6:18 PM, Antigoni Kaliontzopoulou wrote:

Hello everyone,

I am studying a binomial trait across a phylogeny and I'm trying to
simulate specific numbers of evolutionary shifts (1, 5, 10 shifts)
across random trees.

Does anyone know how this can be done?

Any help is greatly appreciated

Cheers,
Antigoni



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Simulating a phylogeny with outgroup on long branch

2011-10-13 Thread Liam J. Revell


Hi Scott.

I'm not sure why you want to do this, but based on your description it 
sounds like you could just generate two separate trees and then paste 
them together across a deep basal split.


The following is an example of doing this, although I'm sure you could 
also come up with your own using different functions than those that I 
have chosen:


require(phytools); require(geiger) # load packages
ntaxa1-50; ntaxa2-50 # number of taxa in each clade
total.height-10 # total tree height (may need to be adjusted)
# simulate subtree 1
tr1-birthdeath.tree(b=1,d=0,taxa.stop=ntaxa1)
tr1$tip.label-1:ntaxa1; tr1$root.edge-0
# simulate subtree 2
tr2-birthdeath.tree(b=1,d=0,taxa.stop=ntaxa2)
tr2$tip.label-1:ntaxa2+ntaxa1; tr2$root.edge-0
# compute tree heights, so that the whole thing can be ultrametric
height1-max(nodeHeights(tr1))
height2-max(nodeHeights(tr2))
# create the basal split
tr3-list(Nnode=1,edge=matrix(c(3,1,3,2),2,2,byrow=T),edge.length=c(total.height-height1,total.height-height2),tip.label=c(tr1,tr2))
class(tr3)-phylo
# attach the two subtrees one by one
tr3$tip.label[tr3$tip.label==tr1]-NA
tr3-paste.tree(tr3,tr1)
tr3$tip.label[tr3$tip.label==tr2]-NA
tr3-paste.tree(tr3,tr2)
# plot the full tree
plot(tr3)

I hope this is kind of what you are going for.

All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 10/13/2011 5:32 PM, Scott Chamberlain wrote:

Hello,

I want to simulate two types of phylogenies: one normally done with, for
example, rcoal in ape; and the second type with an outgroup on a long
branch.  The first is easy with rcoal.

I'm not sure how to make the second kind of tree.  An example of this second
kind is a tree with say all animals, and then an outgroup with all plant
species, so that you have many animal species that have relatively short
branches among them, and then the distance between plants and animals is
relatively long.

Is there a way to simulate this second kind of tree with any available R
functions?

Thanks!
Scott Chamberlain
Rice University

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] basal dichotomy in phylo object

2011-10-25 Thread Liam J. Revell

In my phytools package 
(http://cran.r-project.org/web/packages/phytools/index.html) there is a 
function reroot() that is really just a simple wrapper around root() 
that allows you to root along internal or terminal edges (rather than 
only at nodes).


In your example, assuming that the node 500 is tipward of 174 in the 
tree, to re-root the tree halfway along the edge leading to node number 
500 (in tree$edge), you would just do:


rooted.tree-reroot(tree,node.number=500,position=tree$edge.length[tree$edge[,2]==500]/2)

I think.

All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 10/25/2011 10:44 AM, Ondřej Mikula wrote:

Hello,
I have an unrooted tree, but I know that basal dichotomy is between nodes
labeled 174 and 500. I am wandering how to indicate it in the object of
class 'phylo'. I tried 'root' function of 'ape' but with no success.
I will be grateful for any advice. Best wishes
Ondrej Mikula



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Comparative Methods and Pseudo-Traits

2011-11-10 Thread Liam J. Revell


Hi David.

In general it is inconsequential whether X or Y are biologically 
inherited traits; but whether the residual error in Y given X is 
correlated or independent among species.  In the case of growth rate as 
a function of habitat degradation this corresponds to:


Growth Rate = beta0 + beta1*Habitat Degradation + e

It seems highly plausible that the residual error in Growth Rate (e), 
that is error not explained by Habitat Degradation, is phylogenetically 
correlated.  This would imply only that closely related species have 
growth rates that deviate from the mean relationship between growth rate 
and habitat degradation in similar ways.  In this case, PIC or some 
other phylogenetic method should be used.  To be clear also, contrasts 
need to be computed for both X  Y regardless of whether or not there is 
phylogenetic autocorrelation in both X  Y!  (See Revell 2010, MEE; or, 
better yet, Stone 2011, Syst. Biol. doi:10.1093/sysbio/syq098.)


Regarding extinction probability, perhaps someone else on the list can 
address that specific example.


All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 11/10/2011 3:36 PM, David Bapst wrote:

Hello all,
A recent discussion set my mind thinking on a particular issue and, once
again, I decided to ask for the general opinion of R-Sig-Phylo denizens. It
may be easier to start with an example.

Let's say that there exists a worker who is measuring several different
traits across a number of species and then testing for correlations among
these traits. The first test is body size versus growth rate and they use
independent contrasts or PGLS to test for a the correlation, accounting for
phylogeny. Both of these traits are inherited, evolving variables. Now
let's say they'd like to test for the relationship between growth rate and
some metric of the anthropogenic degradation of that species' habitat. Now
what? It is even valid to apply PIC to the habitat degradation metric even
though it is not an inherited, evolving trait? It's unclear to me.

Let's consider a paleontological example, one which I have found myself
both strongly agreeing and disagreeing with at times. Essentially, how
should we test for extinction selectivity on some trait at a mass
extinction event? Let's say we think body size is a predictor of the risk
of extinction during that event and so we want to test for a correlation
between them (please ignore that extinction would be a discrete variable
for the moment). Do we treat these variable with PIC or PGLS? Is it really
proper to refer to the probability of going extinct during a mass
extinction as an evolving trait? Let's say we did and we got different
results than when we used an analysis which did not account for the
phylogenetic covariance. How should we interpret these results?

One explanation I know of is that when we apply phylogenetic comparative
methods to these quasi-traits to consider their relationship to another
trait, we are assuming that these variables are actually the result of some
underlying, unobserved set of traits which are evolving along the
phylogeny. This makes sense, maybe in the extinction event case, which
would mean that any PCM analysis would be testing for an evolutionary
relationship between body size and these unobserved traits which predict
extinction. Of course, if extinction risk is largely a function of
non-inherited traits, then the initial assumption may be incorrect (that
extinction risk itself is an evolving trait). Regardless, I don't see how
to apply that explanation to the habitat degradation example.

So, what do people think? How should we test for correlation when
non-evolving quasi-traits are involved? I'm very interested to hear
people's thoughts on this matter.
-Dave Bapst, UChicago



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Comparative Methods and Pseudo-Traits

2011-11-14 Thread Liam J. Revell


Hi David.

 Poaching Intensity = beta0 + beta1*Body Size + e

I think it depends on how the residual error in the model is distributed 
(esp. correlated) among species.  It seems possible to invent 
hypothetical scenarios (as I did in my previous email) about how the 
residual error in poaching intensity given body size could be 
phylogenetically autocorrelated, but this is fundamentally an empirical 
question.  If the residual error of poaching intensity given body size 
is phylogenetically correlated and we ignore this then we risk 
overestimating the predictive value of our model.


In addition, the residual error is likely/guaranteed to be non-Brownian 
if the response variable is binary (e.g., extant v. extinct).  For these 
type of data the tree should not be ignored, but simple GLS regression 
is probably not appropriate.  One option might be the phylogenetic 
logistic regression of Ives  Garland (2009), but I'm not too familiar 
with this method.


All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 11/14/2011 3:54 PM, David Bapst wrote:

Liam, Joe, Pasquale, all-

Thank you for your kind input.It seems that I am not the only one who
considers this issue at length.

There is just one point I'd like clarification of. Liam, in my first
example which you used, the inherited trait is the response and the
not-directly-inheritable trait the predictor, such that:

Growth Rate = beta0 + beta1*Habitat Degradation + e

What if we were to switch these? It seems to me this is the only real
difference in the extinction example. To make a neontological example,
let's say we were interested in whether larger mammals were more likely
to experience higher levels of poaching. The response here would be the
poaching intensity and body size the predictor, such that:

Poaching Intensity = beta0 + beta1*Body Size + e

Would your argument still apply? (It seems to me that it should.) If so,
then it would seem your explanation should equally apply to extinction
selectivity cases.

-Dave


On 11/10/2011 3:36 PM, David Bapst wrote:

Hello all,
A recent discussion set my mind thinking on a particular issue
and, once
again, I decided to ask for the general opinion of R-Sig-Phylo
denizens. It
may be easier to start with an example.

Let's say that there exists a worker who is measuring several
different
traits across a number of species and then testing for
correlations among
these traits. The first test is body size versus growth rate and
they use
independent contrasts or PGLS to test for a the correlation,
accounting for
phylogeny. Both of these traits are inherited, evolving
variables. Now
let's say they'd like to test for the relationship between
growth rate and
some metric of the anthropogenic degradation of that species'
habitat. Now
what? It is even valid to apply PIC to the habitat degradation
metric even
though it is not an inherited, evolving trait? It's unclear to me.

Let's consider a paleontological example, one which I have found
myself
both strongly agreeing and disagreeing with at times.
Essentially, how
should we test for extinction selectivity on some trait at a mass
extinction event? Let's say we think body size is a predictor of
the risk
of extinction during that event and so we want to test for a
correlation
between them (please ignore that extinction would be a discrete
variable
for the moment). Do we treat these variable with PIC or PGLS? Is
it really
proper to refer to the probability of going extinct during a mass
extinction as an evolving trait? Let's say we did and we got
different
results than when we used an analysis which did not account for the
phylogenetic covariance. How should we interpret these results?

One explanation I know of is that when we apply phylogenetic
comparative
methods to these quasi-traits to consider their relationship to
another
trait, we are assuming that these variables are actually the
result of some
underlying, unobserved set of traits which are evolving along the
phylogeny. This makes sense, maybe in the extinction event case,
which
would mean that any PCM analysis would be testing for an
evolutionary
relationship between body size and these unobserved traits which
predict
extinction. Of course, if extinction risk is largely a function of
non-inherited traits, then the initial assumption may be
incorrect (that
extinction risk itself is an evolving trait). Regardless, I
don't

Re: [R-sig-phylo] covriance matrix internal nodes

2011-11-23 Thread Liam J. Revell

Here is an alternative version that is more than twice as fast (helpful 
for large trees), and only a little bit slower than vcv.phylo():


vcvPhylo2-function(tree,anc.nodes=T){
  n-length(tree$tip.label)
  h-nodeHeights(tree)[order(tree$edge[,2]),2]
  h-c(h[1:n],0,h[(n+1):length(h)])

M-mrca(tree,full=anc.nodes)[c(1:n,anc.nodes*(n+2:tree$Nnode)),c(1:n,anc.nodes*(n+2:tree$Nnode))]
  C-matrix(h[M],nrow(M),ncol(M))
  if(anc.nodes)
rownames(C)-colnames(C)-c(tree$tip.label,n+2:tree$Nnode)
  else rownames(C)-colnames(C)-tree$tip.label
  return(C)
}

All the best. - Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 11/21/2011 6:16 PM, ppi...@uniroma3.it wrote:

Hi Liam,
it works perfectly! Thankyou very much!
Andit is elegant...



Hi Paolo.

I think this code fragment from my phytools function anc.trend does what
you want (here, I have made it into its own function), although it might
not be the most elegant way to do it:

vcvPhylo-function(phy,anc.nodes=T){
if(anc.nodes){
  D-dist.nodes(phy)
  ntips-length(phy$tip.label)
  Cii-D[ntips+1,]
  C-D; C[,]-0
  for(i in 1:nrow(D)) for(j in 1:ncol(D))
C[i,j]-(Cii[i]+Cii[j]-D[i,j])/2

dimnames(C)[[1]][1:length(phy$tip)]-dimnames(C)[[2]][1:length(phy$tip)]-phy$tip.label
  C-C[c(1:ntips,(ntips+2):nrow(C)),c(1:ntips,(ntips+2):ncol(C))]
  return(C)
} else return(vcv.phylo(phy))
}

All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 11/21/2011 12:09 PM, ppi...@uniroma3.it wrote:
Hi all,
Having a  tree (NOT ultrametric in my case) I would
like to extract the phylogenetic covariance matrix
including both tips and nodes (like dist.nodes for
patristic distances).
vcv returns the cov matric just for tips.
Someone has a fast way?
  
Thankyou very much
  
Paolo
  
___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
  







___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] split data set and topology on PGLS

2011-12-14 Thread Liam J. Revell


Hi Renata.

I agree with Ted - you should be able to fit an ANCOVA model with 
gls(y~x+ecomorph,...,correlation=corBrownian(phy=tree)), or something 
like this, where ecomorph is a factor.


Alternatively, you could use the method we devised in Revell  Collar 
(2009; Evolution) in which we first reconstructed the discrete character 
on the tree (in your case, this would be ecomorph), and then we fit a 
model in which different covariance structure between our variables was 
permitted on different branches of the tree.  This method is also 
implemented in my phytools package (function: evol.vcv).


All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 12/14/2011 3:26 PM, Theodore Garland Jr wrote:

Why not just include an interaction term between your main effect coding 
variable (presumably coded as 0, 1 to indicate your two ecomorphs) and one or 
more of your continuous independent variables?

Cheers,
Ted

Theodore Garland, Jr.
Professor
Department of Biology
University of California, Riverside
Riverside, CA 92521
Office Phone:  (951) 827-3524
Home Phone:  (951) 328-0820
Facsimile:  (951) 827-4286 = Dept. office (not confidential)
Email:  tgarl...@ucr.edu
http://www.biology.ucr.edu/people/faculty/Garland.html

Experimental Evolution: Concepts, Methods, and Applications of Selection 
Experiments
Edited by Theodore Garland, Jr. and Michael R. Rose
http://www.ucpress.edu/book.php?isbn=9780520261808
(PDFs of chapters are available from me or from the individual authors)


From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on 
behalf of Matt Pennell [mwpenn...@gmail.com]
Sent: Wednesday, December 14, 2011 12:17 PM
To: Renata Brandt
Cc: r-sig-phylo@r-project.org
Subject: Re: [R-sig-phylo] split data set and topology on PGLS

Hi Renata,

This seems to be an appropriate approach. (I am sure someone will correct
me if i am wrong in this). Phylogenetic regression in the way that you are
using it is basically to remove the statistical non-independence between
data points. Effectively what you are doing by splitting your data into two
morphologies is pruning your tree. Under Brownian motion, evolution along
any branch is independent of evolution along any other branch, so pruning
your tree will not change the correlation struction of the data.
(Similarily you can still do PGLS with incomplete sampling of species). So
as long as you can justify splitting the data up as you are, it seems fine.

cheers,
matt

On Wed, Dec 14, 2011 at 12:09 PM, Renata Brandtrenata.bra...@gmail.comwrote:


Hello everybody.

I have some questions for you.
When performing a phylogenetic regression between two continuous traits,
the dependent trait clearly divides my data set in two distinct
morphologies. This division overlaps with what I previously thought were
two different ecomorphs. The relationship between the same traits seems to
be different for each morphology.

Now my question is:
- Is it ok to split the data set (one data table and topology for each
ecomorph) and reanalyze data to get the relationship between traits for
each group? (That is what I would do if not using a phylogenetic approach).
- If this is wrong, any hints on how should I proceed?

Many thanks. Cheers.

--
Renata Brandt
Departamento de Biologia - FFCLRP
Universidade de São Paulo
Brasil
(16) 82039533

  /..)
(  (
  )  )
  (  (
  )  )
_(  (_
  )  )
  \ )
\

[[alternative HTML version deleted]]


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo




 [[alternative HTML version deleted]]


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Measures of dissimilarity between phylogenetic trees

2011-12-26 Thread Liam J. Revell


Hi Bruno.

RF.dist in the phangorn package computes Robinson-Foulds distance 
(http://en.wikipedia.org/wiki/Robinson-Foulds_metric). This is not 
obvious, but (unless you want the position of the root to affect the 
score) you should probably unroot your trees before using RF.dist, e.g.:

 tr1-read.tree(text=(A,(B,C));)
 plot(tr1)
 tr2-read.tree(text=((A,B),C);)
 # tr1  tr2 have the same unrooted topology, yet:
 RF.dist(tr1,tr2)
[1] 2
 RF.dist(unroot(tr1),unroot(tr2))
[1] 0

In addition, phangorn:treedist and ape:dist.topo provide other metrics 
of tree distance.


All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 12/26/2011 6:27 PM, Bruno de Medeiros wrote:

Hi everyone,

I am simulating characters evolving in a phylogetic tree under a number of
models and using those characters to build another tree using parsimony.
I would like, then, to measure how similar are the recovered trees when
compared to the original ones, taking only topology into account. Is there
any function in the R packages to calculate tree distance metrics? I
considered using TNT (the program that I am using to build phylogenies) to
calculate SPR distances, but it would be much more convenient if there were
a way of doing it in R.

Thanks in advance,

Bruno
--
Bruno A. S. de Medeiros
PhD Student - Farrell Lab
Harvard University

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] phyDat format problem

2012-01-09 Thread Liam J. Revell


Hi Jen.

Try changing to a matrix first (from a data frame). [I.e., 
p-as.matrix(p); p2-phyDat(p,type=USER,levels=c(0,1)) ]


I'm not sure why this works, but it seems to. - Liam

More details below.

When I do:
p-matrix(c(1,0,0,0,1,0,1,0,1,0,
1,1,0,1,1,0,0,1,0,0,
1,0,0,1,0,1,0,0,1,0,
0,1,0,1,1,1,0,0,0,0,
0,0,0,1,1,1,1,0,1,0),5,10,byrow=T)
dimnames(p)-list(c(A,B,C,D,E),paste(X,1:10,sep=))
p2-phyDat(p,type=USER,levels=c(0,1))

I get:
 p2
5 sequences with 10 character and 9 different site patterns.
The states are 0 1

However:
p-as.data.frame(p)
p2-phyDat(p,type=USER,levels=c(0,1))

Gives me:
 p2
10 sequences with 5 character and 5 different site patterns.
The states are 0 1

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com


On 1/9/2012 9:14 AM, J Greenwood wrote:

Hi all,

I am having a problem with getting parsimony scores in Phangorn and have
found that the program is not reading in my datafile the way I expect it.
I have 10 characters and 5 taxa, but phangorn seems to read this the
opposite way round.

Can anybody help? Details follow,

thanks,

Jen


I made a test dataset to try and work out the problem which is the
following:


p-read.table(file=tab_phangorn.txt, header=T)
p

X1 X2 X3 X4 X5 X6 X7 X8 X9 X10
A 1 0 0 0 1 0 1 0 1 0
B 1 1 0 1 1 0 0 1 0 0
C 1 0 0 1 0 1 0 0 1 0
D 0 1 0 1 1 1 0 0 0 0
E 0 0 0 1 1 1 1 0 1 0

Where A-E are taxa and X1-X10 are characters.
I then tried converting it to phyDat format which gives the following:


p2-phyDat(p, type=USER, levels=c(0,1))
p2


10 sequences with 5 character and 5 different site patterns.
The states are 0 1


str(p2)

List of 10
$ X1 : int [1:5] 2 2 2 1 1
$ X2 : int [1:5] 1 2 1 2 1
$ X3 : int [1:5] 1 1 1 1 1
$ X4 : int [1:5] 1 2 2 2 2
$ X5 : int [1:5] 2 2 1 2 2
$ X6 : int [1:5] 1 1 2 2 2
$ X7 : int [1:5] 2 1 1 1 2
$ X8 : int [1:5] 1 2 1 1 1
$ X9 : int [1:5] 2 1 2 1 2
$ X10: int [1:5] 1 1 1 1 1
- attr(*, class)= chr phyDat
- attr(*, weight)= int [1:5] 1 1 1 1 1
- attr(*, nr)= int 5
- attr(*, nc)= int 2
- attr(*, index)= int [1:5] 1 2 3 4 5
- attr(*, levels)= num [1:2] 0 1
- attr(*, allLevels)= chr [1:3] 0 1 ?
- attr(*, type)= chr USER
- attr(*, contrast)= num [1:3, 1:2] 1 0 1 0 1 1

It has read the data in the opposite way round to the way i was
expecting, which was odd, but I transposed the matrix anyway, but got
the same output:


ptrans-t(p)
ptrans

A B C D E
X1 1 1 1 0 0
X2 0 1 0 1 0
X3 0 0 0 0 0
X4 0 1 1 1 1
X5 1 1 0 1 1
X6 0 0 1 1 1
X7 1 0 0 0 1
X8 0 1 0 0 0
X9 1 0 1 0 1
X10 0 0 0 0 0


ptrans2

10 sequences with 5 character and 5 different site patterns.
The states are 0 1

str(ptrans2)

List of 10
$ X1 : int [1:5] 2 2 2 1 1
$ X2 : int [1:5] 1 2 1 2 1
$ X3 : int [1:5] 1 1 1 1 1
$ X4 : int [1:5] 1 2 2 2 2
$ X5 : int [1:5] 2 2 1 2 2
$ X6 : int [1:5] 1 1 2 2 2
$ X7 : int [1:5] 2 1 1 1 2
$ X8 : int [1:5] 1 2 1 1 1
$ X9 : int [1:5] 2 1 2 1 2
$ X10: int [1:5] 1 1 1 1 1
- attr(*, class)= chr phyDat
- attr(*, weight)= int [1:5] 1 1 1 1 1
- attr(*, nr)= int 5
- attr(*, nc)= int 2
- attr(*, index)= int [1:5] 1 2 3 4 5
- attr(*, levels)= num [1:2] 0 1
- attr(*, allLevels)= chr [1:3] 0 1 ?
- attr(*, type)= chr USER
- attr(*, contrast)= num [1:3, 1:2] 1 0 1 0 1 1

Can anyone explain to me what I am doing wrong? How can I get the phyDat
format to hold my information correctly?

--
J Greenwood
jenny.greenw...@bristol.ac.uk

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Lambda; PIC vs. PGLS

2012-01-13 Thread Liam J. Revell


Hi Karin.

gls(...,correlation=corPagel(...,fixed=F),method=ML) and 
pgls(...,lambda=ML) should produce the same result unless one or the 
other is not optimizing properly or unless different bounds on the 
fitted model parameters (in particular, lambda) are used.


To show this to yourself, you can try the following example which 
simulates and then fits the same model using three different functions.


I hope this is helpful. - Liam

set.seed(1)
require(phytools); require(caper); require(nlme); require(geiger)
# simulate tree
tree-pbtree(n=100,scale=1)
# simulate x under BM
x-fastBM(tree)
# simulate y  residual error under the lambda model
y-0.4*x+fastBM(lambdaTree(tree,0.7),sig2=0.5)
data-as.data.frame(cbind(x,y))
data-cbind(names(x),data)
colnames(data)-c(Species,x,y)
# fit using nlme:gls
r1-gls(y~x,data=data,correlation=corPagel(0.5,tree,fixed=FALSE),method=ML)
# fit using caper:pgls
r2-pgls(y~x,data=comparative.data(tree,data,Species),lambda=ML)
# fit using phytools:phyl.resid
r3-phyl.resid(tree,x,y,method=lambda)

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 1/13/2012 6:04 AM, Karin Schneeberger wrote:

Dear all

I am currently analysing a set of traits from different species - let's say 
brain size and body mass - using a tree where branch lengths were set 
artificially (after Graphen) as the length was not available.

I calculate phylogenetic independent contrasts (PIC) using the package caper 
and the crunch-algorithm. Beforehand, I used the function caic.diagnostic to test whether 
the model is appropriate (which was the case). In the end, this provided me with a 
significant correlation between brain size and body mass.


I then calculate lambda as a measurement for phylogenetic signal, using  the package 
ape:


data- comparative.data(tree, data, Species)
result- gls(brainsize~bodymass, data = data, correlation = corPagel(value = 1, phy = 
tree, fixed = FALSE), method = ML)
summary(result)

Here, lambda was 1, which - as far as I understood - means evolution under Brownian 
motion. I perform a phylogenetic generalised least square (PGLS) regression using 
caper, which includes lambda, as an addition to the PIC analysis:

data- comparative.data(tree, data, Species, vcv = TRUE)
Mod1- pgls(brainsize~bodymass, data = data, lambda = ML)
summary(Mod1)


Surprisingly, lambda here was zero. Furthermore, the correlation between brain 
size and body mass became non-significant.


I now have two contradiction values for lambda and results from both PIC and 
PGLS that differ dramatically in the way they can be interpreted. How does it 
come that with one method, lambda is zero and with the other lambda is one? And 
in the end, which method should be used, PIC or PGLS, or should both be 
discussed?


I would be grateful for advice.Thank you very much in advance for your help.

Best regards

Karin Schneeberger
PhD candidate in Ecology and Evolution
[[alternative HTML version deleted]]




___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Blomberg K Statistic

2012-01-19 Thread Liam J. Revell


Hi Daniel.

I can think of two possibilities:

1) This is an error.

2) Your tree has an extremely unusual shape (for instance, it has many 
bifurcations very close to the present day).


On constant rate Yule trees, substituting 1s for the true branch lengths 
does not (on average) bias K in any particular direction.


If you share your tree and data I would be happy to look at the issue 
more closely.


All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 1/19/2012 9:53 PM, dwe...@life.illinois.edu wrote:

Hi,
   I'm using the Picante package in R to calculate the Blomberg K.  I'm
trying to test for a phylogenetic signal in my data.  I'm using a
dataset with the PC score for 21 different species on a tree with
varying branch lengths.  I did this using the code here:
http://bodegaphylo.wikispot.org/IV._Testing_Phylogenetic_Signal_in_R

   I got a K statistic of 0.0006457603 and yet the resultant PIC.variance.P
is 0.001.  If I'm correct, the PIC.variance.P is the p-value and
significant at 0.05.  Is that correct?
   I then decided to play with my data and set all branch lengths equal to
1 to see how that would affect things.  I got a K statistic of 2.921763
and a PIC.variance.P value of 0.001.  My understanding is that K
statistics close to 0 are not significant and become significant close
to and above 1.
   I'm confused why I got a significant p-value for both K statistics, even
though one is very close to 0 (0.00065) and the other is above 2.  I
would appreciate any help in explaining Picante and the K statistic.
Thank you.




-Daniel

º  -º  º  º  º  

Daniel P Welsh
PhD Candidate
Teaching Assistant
Department of Animal Biology
University of Illinois at Urbana-Champaign
202 Shelford Vivarium
606 E. Healey Street
Champaign, IL 61821
lab phone: (217) 333-5323

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Bug in geiger::fitContinuous with meserr

2012-02-02 Thread Liam J. Revell


Hi all.

If you don't want to apply the fix, you can circumvent the problem by 
first assuring that meserr is in the same order as the tree tip labels, 
and then removing the names from the input vector of standard errors.


So, for instance:

se-se[tree$tip.label]
names(se)-NULL
fitContinuous(tree,x,model=lambda,meserr=se)

works. Note that the order of meserr should be, so far as I can tell, 
the order of tree$tip.label and not the order of the data in x. This is 
because the internally called function treedata(...,sort=T) sorts the 
data into the order of the tip labels.


This also agrees with Matt's diagnosis of the problem because if 
is.null(names(meserr)) then the code with the bug is circumvented.


In theory, one should be able to supply meserr as a matrix; 
unfortunately there is a second bug whereby:


o-match(rownames(td$data),rownames(meserr))
meserr-meserr[o,]

should be changed to

o-match(rownames(td$data),rownames(meserr))
meserr-as.matrix(meserr[o,])

because the former inadvertently converts meserr to a vector, when a 
matrix is subsequently needed.


- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 2/2/2012 1:41 PM, Matt Johnson wrote:

I'm not sure if this is the best place to make a bug report, but I thought 
others might be interested.

I was trying to replicate, with geiger, a demonstration of estimating lambda 
with measurement error from Liam's blog:

http://phytools.blogspot.com/2011/11/including-measurement-error-in.html

Replicating this with fitContinuous, I received the following error:


fitContinuous(tree,xe,model=lambda,meserr=se)

Error in meserr[o, ] : incorrect number of dimensions

The problem is related to this bit of code in fitContinuous, used to determine 
whether meserr is a single value, a vector, or a matrix:

 else if (is.vector(meserr)) {
 if (!is.null(names(meserr))) {
 o- match(rownames(td$data), names(meserr))
 if (length(o) != ntax)
 stop(meserr is missing some taxa from the tree)
 meserr- as.matrix(meserr[o,])

Removing the comma from the final line fixed the problem. The values I receive 
from geiger after removing the comma are identical to the values I get from 
phytools::phylosig for the same data, so I am assuming I haven't broken 
anything important.

The full code I used is at the end of this message, for those interested.

Best,

Matt Johnson




library(geiger)

Loading required package: ape
Loading required package: MASS
Loading required package: mvtnorm
Loading required package: msm
Loading required package: ouch
Loading required package: subplex

library(phytools)

Loading required package: mnormt
Loading required package: phangorn
Loading required package: quadprog
Loading required package: igraph
Loading required package: rgl

source(mjContinuous.R) #fitContinuous with offending comma removed
set.seed(2)
gen.lambda=0.7
tree = drop.tip(birthdeath.tree(b=1,d=0,taxa.stop=501),501)
se = sqrt(rchisq(n=500,df=2))
names(se) = tree$tip.label

x= fastBM(lambdaTree(tree,gen.lambda),sig2=1)
e = rnorm(n=500,sd=se)
xe=x+e

fitContinuous(tree,x,model=lambda)

Fitting  lambda model:
$Trait1
$Trait1$lnl
[1] -1031.783

$Trait1$beta
[1] 1.221674

$Trait1$lambda
[1] 0.7613218

$Trait1$aic
[1] 2069.567

$Trait1$aicc
[1] 2069.615

$Trait1$k
[1] 3



fitContinuous(tree,xe,model=lambda)

Fitting  lambda model:
$Trait1
$Trait1$lnl
[1] -1154.987

$Trait1$beta
[1] 1.412728

$Trait1$lambda
[1] 0.5495113

$Trait1$aic
[1] 2315.975

$Trait1$aicc
[1] 2316.023

$Trait1$k
[1] 3



fitContinuous(tree,xe,model=lambda,meserr=se)

Error in meserr[o, ] : incorrect number of dimensions


mjContinuous(tree,xe,model=lambda,meserr=se) #with problematic comma removed

Fitting  lambda model:
$Trait1
$Trait1$lnl
[1] -1141.14

$Trait1$beta
[1] 1.170196

$Trait1$lambda
[1] 0.7484273

$Trait1$aic
[1] 2288.281

$Trait1$aicc
[1] 2288.329

$Trait1$k
[1] 3

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Reconstructing Ancestral Characters and Exporting Data

2012-02-08 Thread Liam J. Revell


Hi Amy.

Yes, if you wanted to export ancestral character values as node labels, 
you can.


For instance:

temp-ace(x,tree)
tree$node.label-round(temp$ace,5) # for instance
write.tree(tree,file=)

You could also loop this over the set of trees in a multiPhylo object.

I hope this helps. - Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 2/8/2012 1:59 PM, Amy Boddy wrote:

Hello,
I was wondering if somebody could help me with a problem I am having in
APE. I want to reconstruct the ancestral states of some continuous
characters for over 600 mammalian species. I was able to reconstruct the
ancestral states with the ace function:ace_body = ace(data$LogBody, tree,
type=continuous  and then plot them on my tree:

nodelabels(round(ace_body$ace, digits=1))


However, my problem is that my data tree is very large, so plotting is a
bit messy. My tree file does not have the internal nodes labeled, and I
know that they are assigned a number to them (ex. if i have 600 species,
the nodes are numbered starting with 601, 602 etc). So my question is, what
is the best way to export this information? Is there a function to write
this tree with the node labels (i.e. the ancestral reconstructions?)?

Any input would be helpful and very much appreciated.
Thank you!
-Amy

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] differing sigma2 results

2012-02-11 Thread Liam J. Revell


Hi Pascal.

The first clue is the the length of your tree is exactly the same as the 
ratio of your two sets of rates. This suggests that OUCH is first 
rescaling the tree to unit length.  Let's see if this explains your 
results, just working with your first character:


 tree-read.nexus(testtree.nex)
 X-read.csv(spdata.csv,header=T,row.names=1)
 X-as.matrix(X[,1:4])
 fitContinuous(tree,X[,1])
Fitting  BM model:
$Trait1
$Trait1$lnl
[1] -164.5281
$Trait1$beta
[1] 0.1788076
$Trait1$aic
[1] 333.0561
$Trait1$aicc
[1] 333.208
$Trait1$k
[1] 2
 fitContinuous(rescaleTree(tree,1),X[,1])
Fitting  BM model:
$Trait1
$Trait1$lnl
[1] -164.5281
$Trait1$beta
[1] 5.885607
$Trait1$aic
[1] 333.0561
$Trait1$aicc
[1] 333.208
$Trait1$k
[1] 2

Looks like that's your explanation. Note that the first rate from 
fitContinuous on the tree in its original scale is slightly different 
from what you report: off by a factor of n/(n-1), in fact. This is 
because ic.sigma returns the unbiased REML estimates of the rate (from 
Felsenstein's contrasts); whereas the ML rates from fitContinuous and 
OUCH are biased by a factor of (n-1)/n.


Note also that the likelihoods (and thus the AIC scores) are not 
affected by rescaling the tree. This is because scaling the branches by 
k  the rate sigma^2 by 1/k yields exactly the same log-likelihood. This 
is different from the effect of rescaling the data which affects the 
likelihood, but should not alter the relative likelihoods of different 
models fit to the same rescaled data!


All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 2/11/2012 10:20 AM, Pascal Title wrote:

Hello all

I am trying to calculate sigma2 values for a clade, using 4 PCA axes as my
characters.
I first wanted to figure out if my data better fit a BM model or an OU1
model of evolution. So I compared AICc values using the fitContinuous
function in Geiger.
I found the following:

BM OU
Score1  2.6  0
Score2  9.1  0
Score3 17.2  0
Score4  9.8  0

However, using the pmc package, I ran
pmc(tree,data,modelA='BM',modelB='OU',nboot=100)
for the first PC axis and found that the BM and OU1 distributions are
mostly overlapping, which I believe tells me that you can't really
differentiate BM and OU with the data at hand (see plot of
pmchttp://dl.dropbox.com/u/34644229/Tangarinae.pdf
).
Therefore, I was planning on calculating sigma2 based on a BM model of
evolution.
But I noticed that I get hugely different results using different methods:

(here is my treehttp://dl.dropbox.com/u/34644229/testtree.nex, here is
my datahttp://dl.dropbox.com/u/34644229/spdata.csv)

With OUCH method:

require(geiger)
require(ouch)

tree-read.nexus('testtree.nex')
spdata-read.csv('spdata.csv',stringsAsFactors=FALSE)
rownames(spdata)-spdata$X
spdata$X-NULL
spdata$labels-NULL
spdata-spdata1

ot-ape2ouch(tree)
otd-as(ot,data.frame)
spdata$labels-rownames(spdata)
otd-merge(otd,spdata,by=labels,all=TRUE)
rownames(otd)-otd$nodes
ot-with(otd,ouchtree(nodes=nodes,ancestors=ancestors,times=times,labels=labels))
b1-brown(tree=ot,data=otd[colnames(spdata1)])
sigma2-coef(b1)$sigma.sq.matrix


sigma2

   [,1] [,2][,3]  [,4]
[1,]  5.885605498  1.17240962  0.7760799 -0.004528573
[2,]  1.172409617  1.52861569  0.1039507 -0.073455526
[3,]  0.776079917  0.10395074  1.0567364 -0.158017246
[4,] -0.004528573 -0.07345553 -0.1580172  1.236342488



With ic.sigma method from GEIGER:

require(geiger)

tree-read.nexus('testtree.nex')
spdata-read.csv('spdata.csv',stringsAsFactors=FALSE)
rownames(spdata)-spdata$X
spdata$X-NULL
spdata$labels-NULL
sig-ic.sigma(tree,spdata)

  Score1 Score2  Score3   Score4
Score1  0.1809881293  0.036052743  0.023865217 -0.0001392581
Score2  0.0360527432  0.047006429  0.003196587 -0.0022588293
Score3  0.0238652170  0.003196587  0.032495680 -0.0048591850
Score4 -0.0001392581 -0.002258829 -0.004859185  0.0380187415

Interestingly, the OUCH result is greater than the GEIGER result by a
factor of 32.

I also simulated a tree and data under BM and tried both methods and they
agree, so somehow this has something to do with my data.
Both methods are estimating these parameters under brownian motion, so
where is there a breakdown?

Any advice would be greatly appreciated!



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] problem calculating independent contrasts

2012-02-20 Thread Liam J. Revell

Graham is absolutely right. If you did this you would find that 
Chlorocebus pygerythrus has an underscore separating genus  specific 
epithet in your tree, but not in your data vector:


 require(geiger)
 name.check(tree,x)
$Tree.not.data
[1] Chlorocebus_pygerythrus

$Data.not.tree
[1] Chlorocebus pygerythrus

- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 2/20/2012 7:30 PM, Graham Slater wrote:

Hi Tom,

have you tried running

name.check(t.100species, log_repertoire_data)

to confirm that the names perfectly match in both? If there is even a slight 
mismatch, then pic will ignore all the names in the data and assume that they 
are in the same order as the tip labels in the tree. Thus the pics returned 
would be random.

Graham

Graham Slater
Department of Ecology and Evolutionary Biology
University of California, Los Angeles
621 Charles E Young Drive South
Los Angeles
CA 90095-1606

(310) 825-4669
gsla...@ucla.edu
www.eeb.ucla.edu/gslater






On Feb 20, 2012, at 3:54 PM, Tom Schoenemann wrote:


Hello,

I keep getting the following error when trying to calculate independent 
contrasts:


pic.log_repertoire_data- pic(log_repertoire_data, t.100species)

Warning message:
In pic(log_repertoire_data, t.100species) :
  the names of argument 'x' and the tip labels of the tree did not match: the 
former were ignored in the analysis.

However, unless I'm misunderstanding what this means, then it is not correct.

My data is in:

log_repertoire_data

   Alouatta_palliata   Alouatta_seniculus  
Aotus_nigricepsAotus_trivirgatus
0.778151 0.778151 
0.778151 0.778151
Ateles_belzebuth Ateles_fusciceps 
Ateles_geoffroyi  Brachyteles_arachnoides
0.778151 0.954243 
1.322219 0.903090
  Cacajao_calvusCallicebus_moloch 
Callicebus_torquatusCallimico_goeldii
1.079181 1.041393 
0.845098 0.845098
  Callithrix_jacchus   Callithrix_penicillata   
Callithrix_pygmaea  Cebus_capucinus
0.954243 0.602060 
1.176091 1.414973
 Cebus_olivaceus Lagothrix_lagotricha   
Leontopithecus_rosaliaPithecia_pithecia
1.079181 1.146128 
1.00 1.00
Saguinus_fuscicollis   Saguinus_geoffroyi   
Saguinus_midas Saguinus_oedipus
1.113943 1.00 
0.903090 0.954243
   Saimiri_oerstedii Saimiri_sciureus
Cercocebus_torquatus_atys   Cercopithecus_ascanius
0.602060 1.301030 
1.079181 0.845098
 Cercopithecus_campbelli Cercopithecus_cephus  
Cercopithecus_mitis  Cercopithecus_neglectus
1.176091 1.204120 
0.845098 0.778151
  Cercopithecus_pogonias Chlorocebus_aethiops 
Colobus_angolensis_palliatus  Colobus_guereza
1.230449 1.322219 
0.903090 0.845098
   Colobus_polykomos   Erythrocebus_patas  
Lophocebus_albigena Macaca_arctoides
0.903090 1.079181 
1.079181 1.230449
 Macaca_fascicularis   Macaca_mulatta
Macaca_nemestrina   Macaca_radiata
1.176091 1.204120 
1.371068 1.397940
  Macaca_silenus  Macaca_sylvanus   
Mandrillus_leucophaeusMandrillus_sphinx
1.322219 1.041393 
1.041393 1.00
Miopithecus_talapoin Nasalis_larvatus 
Papio_anubis   Papio_cynocephalus
1.230449 0.698970 
1.204120 1.00
 Papio_hamadryas  Papio_papio  
Piliocolobus_badius Presbytis_comata
0.477121 1.176091 
1.079181 1.041393
Procolobus_verus   Semnopithecus_entellus

Re: [R-sig-phylo] convergence issues with hansen model

2012-03-13 Thread Liam J. Revell


Hi Rafael.

I don't want to speak on behalf of the authors who are also on this 
list, but OUwie can read in modified phylo objects created by the 
phytools functions read.simmap  make.simmap.


All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 3/13/2012 4:35 PM, Rafael Maia wrote:

Speaking of OUwie, the paper just came out today:
Jeremy M. Beaulieu, Dwueng-Chwuan Jhwueng, Carl Boettiger  Brian C.
O’Meara. MODELING STABILIZING SELECTION: EXPANDING THE
ORNSTEIN-UHLENBECK MODEL OF ADAPTIVE EVOLUTION. Evolution, Accepted
Article, DOI: 10./j.1558-5646.2012.01619.x (link:
http://onlinelibrary.wiley.com/doi/10./j.1558-5646.2012.01619.x/abstract
)

Has anyone tried it out yet? Is there a way of converting SIMMAP trees
to the format to be used with the OUwie function?

Thanks!

Abraços,
Rafael Maia
---
webpage: http://gozips.uakron.edu/~rm72
A little learning is a dangerous thing; drink deep, or taste not the
Pierian spring. (A. Pope)
Graduate Student - Integrated Bioscience
University of Akron
http://gozips.uakron.edu/~shawkey/

On Mar 12, 2012, at 2:39 PM, Liam J. Revell wrote:


You might want to try the OUWie package by Beaulieu  O'Meara
(http://cran.r-project.org/web/packages/OUwie/index.html). I have not
tried it yet, but it promises to do multi-optimum OU model fitting as
well.

All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu mailto:liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 3/12/2012 2:31 PM, Pascal Title wrote:

Hi all

I have a dataset of 5 PC axes for a phylogeny with 52 tips and I
would like
to get the variance-covariance matrix under a global OU model of
evolution. I find that OU is strongly favored over BM, based on
fitContinuous in GEIGER.
However, I am having issues with convergence when trying to fit a hansen
model to my data. What can I do to get around this problem?
Alternatively,
is there another way to get at the multivariate VCV matrix other than
with
the OUCH package?

The error is:
*unsuccessful convergence, code 1, see documentation for `optim'*
*Warning message:*
*In hansen(tree = ot, data = otd[varnames], regimes = otd[regimes], :*
* unsuccessful convergence*


I have posted a small file that contains the following R script along
with
the data herehttp://dl.dropbox.com/u/34644229/OUhansen.zip.

Any advice would be greatly appreciated! Thanks!

require(ouch)
require(ape)

tree-read.nexus('tree.nex')
data-read.csv('data.csv')
rownames(data)-data$X
data$X-NULL

tree
head(data)

colnames(data)-varnames #for hansen()

ot-ape2ouch(tree)
otd-as(ot,data.frame)
data$labels-rownames(data)
otd-merge(otd,data,by=labels,all=TRUE)
rownames(otd)-otd$nodes
ot-with(otd,ouchtree(nodes=nodes,ancestors=ancestors,times=times,labels=labels))
otd$regimes-as.factor(global)
h1-hansen(tree=ot,data=otd[varnames],regimes=otd[regimes],sqrt.alpha=c(1,0,0,0,0,1,0,0,0,1,0,0,1,0,1),sigma=c(1,0,0,0,0,1,0,0,0,1,0,0,1,0,1))





___
R-sig-phylo mailing list
R-sig-phylo@r-project.org mailto:R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo




___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] estimate ancestral states of specific internal nodes

2012-05-08 Thread Liam J. Revell


Hi Annemarie.

If you know the tip names for all the species descended from the target 
node in each tree in your sample, you could use findMRCA in the phytools 
package. findMRCA uses ape's mrca function to get the MRCA of a set of tips.


So, for instance, if your target node is the MRCA of taxa labeled t1, 
t3, and t8, with data in the vector x, you could do the following:


sp-c(t1,t3,t8)
y-ace(x,tree)
target.state-y$ace[as.character(findMRCA(tree,sp))]

Of course, looping this over a set of trees is simple, for instance, 
using sapply:


sp-c(t1,t3,t8)
target.state-sapply(trees,function(a,x,sp)
  ace(x,a)$ace[as.character(findMRCA(a,sp))],x=x,sp=sp)

Or, of course, you could just as easily use a simple for loop.

Perhaps this helps?

All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 5/8/2012 7:44 PM, Annemarie Verkerk wrote:

Dear r-sig-phylo users,

I have a question regarding the retrieval of ancestral states of
internal nodes in a sample of trees. I am doing ancestral state
estimations on a large sample of trees (using ace() in ape), but a few
groups of languages are present in all my trees. I would like to know
how to easily retrieve the ancestral state of the internal node leading
to such a group of languages, given that I know the numbers of the edges
leading to those language tips, but the number of the internal edge
leading to those languages (and thus defining the grouping / clade) will
not be the same in all of the trees.

This would be very easy to do in BayesTraits using 'addnode', but I
really want to use the 'ML' version of ace(), which is not included in
BayesTraits.

Does anyone have an idea? (Sorry if this has been asked before, I could
not find anything in the archives.)

Many thanks,
Annemarie



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] more problems with branch names

2012-05-09 Thread Liam J. Revell

Looks like this may be because your row names in your data frame have 
spaces, while your tree tip labels do not.


Try:

rownames(example)-sub( ,,rownames(example))
example-example[tree$tip.label,]

Help?

- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 5/9/2012 5:13 PM, Agus Camacho wrote:

Dear all,

I had the same problem as T. Gamble and tried the advice from L. Revell
and Rich over my own tree  (attached), all of the hints produced a df
with Nas. Also, I cant see any difference between rownames on my matrix
and tree tip labels...

Could anyone give me a hint about what i am doing wrong here?

Thanks in advance,

Agus
PS: There it goes what I did:

First from Liam's hint:


example- read.csv(example.csv,header=TRUE,row.names=1)

 example

   vara varb
C. leiolepis 1a
C. nicterus  2b
C. sinebrachiatus3c
S. catimbau  4a
N. ablephara 5b
P. erythrocercus 6c
P. tetradactylus 7a
V. rubricauda8b
V. rubricaudavac 9c
M. maximiliani  10a
P. paeminosus   11b

 tree$tip.label

  [1]C.leiolepis   C.nicterusC.sinebrachiatus  S.catimbau
  [5]N.ablephara   P.erythrocercus   P.tetradactylus   V.rubricauda
  [9]V.rubricaudavac   M.maximiliani P.paeminosus

 example-example[tree$tip.label,]
 example

   vara varb
NA  NANA
NA.1NANA
NA.2NANA
NA.3NANA
NA.4NANA
NA.5NANA
NA.6NANA
NA.7NANA
NA.8NANA
NA.9NANA
NA.10   NANA


Now, Rich's hint:


match(tree$tip.label, rownames(example)) -   match

 match

  [1] NA NA NA NA NA NA NA NA NA NA NA



--

Agustín Camacho Guerrero.
Doutorando em Zoologia.
Laboratório de Herpetologia, Departamento de Zoologia, Instituto de
Biociências, USP.
Rua do Matão, trav. 14, nº 321, Cidade Universitária,
São Paulo - SP, CEP: 05508-090, Brasil.



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Non-parametric alternative to phylogenetic ANOVA?

2012-05-30 Thread Liam J. Revell


Hi Karin.

GLS with x as a factor is a generalized ANOVA which assumes [in the case 
of gls(...,correlation=corBrownian)] that the residual error in the 
ANOVA model has evolved by Brownian evolution. If you read your data 
into data frame Z with row names as species names, for instance:


Z-read.table(filename,header=T,row.names=1)
tree-read.tree(treefile)

and your column name for the factor is x  the column name for the 
continuous response variable is y, then you should just be able to do:


fit-gls(y~x,data=Z,correlation=corBrownian(1,tree))

You can then perform various posthoc analyses from the gls object that 
is produced. For instance


summary(fit)
anova(fit)
residuals(fit)

As pointed out by Alejandro, you should check for normality of the 
residuals in residuals(fit) - not the normality of y before analysis. 
summary(fit) will also give you parameter estimated (fitted means for 
each factor) and standard errors. These can be used to conduct posthoc 
comparison of means using t-tests in the standard way.


I hope this helps.

All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 5/30/2012 10:46 AM, Karin Schneeberger wrote:

Hi Alejandro

Thank you for the very quick answer. I tried PGLS before, but then was told 
that GLS is not suitable for multistate categorical variables that can not be 
ranked (otherwise I would treat them as continuous). Also, with GLS it's as far 
as I understood not possible to state statistically whether certain groups are 
greater than others. But I am new into this kind of analysis and am very happy 
for any help and explanation, as I might be totally wrong.

Cheers,
Karin




  Von: Alejandro Gonzalezalejandro.gonza...@ebd.csic.es

CC: r-sig-phylo@r-project.orgr-sig-phylo@r-project.org
Gesendet: 16:26 Mittwoch, 30.Mai 2012
Betreff: Re: [R-sig-phylo] Non-parametric alternative to phylogenetic ANOVA?


Hi Karin,

You could use a gls method and look at the distribution of your residuals. It 
is the residuals which must be normally distributed, which can be checked using 
diagnostic plots such as a histogram or qq-plot of the residuals of your model.

Cheers

Alejandro


On 30, May 2012, at 4:12 PM, Karin Schneeberger wrote:

Dear all


I'm trying to compare one trait across three (unordered categorical) groups 
including 25 species (let's say for example basal metabolic rate of aquatic, 
terrestrial and aerial mammals).

If the data would be normally distributed, I would simply use a phylogenetic ANOVA 
including a post-hoc test on means accounting for the phylogeny, as provided by the 
package phytools. However, my tip-data are far away from being normally 
distributed, and transformations only lead to unsatisfying improvements (e.g Shapiro-Wilk 
test returns a p-Value of 0.08 instead of 0.03 as before transformation). So I am not 
convinced that a phylogenetic ANOVA is the right approach for dealing with these sort of 
data.
Is there any non-parametric approach to compare groups across phylogeny that 
also returns which groups differ from each other?

Your sincerely,
Karin

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



__

Alejandro Gonzalez Voyer

Post-doc

Estación Biológica de Doñana
Consejo Superior de Investigaciones Científicas (CSIC)
Av Américo Vespucio s/n
41092 Sevilla
Spain

Tel: + 34 - 954 466700, ext 1749

E-mail: alejandro.gonza...@ebd.csic.es

Web site (Under construction):

Personal page: http://consevol.org/members/alejandro.html

Group page: http://consevol.org/index.html

For PDF copies of papers see:

http://csic.academia.edu/AlejandroGonzalezVoyer
[[alternative HTML version deleted]]




___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Question about Kcalc with 1 data points per species

2012-06-06 Thread Liam J. Revell


Hi Jonathan.

The thing to do in this case is to estimate K with within-species 
variability. This is described in Ives et al. (2007; Syst. Biol.) and 
implemented in the phytools function phylosig. This will give an 
unbiased estimate of K. (K estimated when within-species variability is 
ignored is downwardly biased.)


To do this, you need the species means for each species and a vector of 
the standard error of the means. If we cannot estimate the standard 
error of the mean for some species (because n=1), then we can just use 
the mean variance. To get the means and standard errors (assuming your 
raw data is in a vector, x, with species names), we can just do the 
following:


# get the mean by species
temp-aggregate(x,by=list(names(x)),mean)
xbar-temp[,2]; names(xbar)-temp[,1]

# get the variance by species
temp-aggregate(x,by=list(names(x)),var)
xvar-temp[,2]; names(xvar)-temp[,1]
# replace NA with mean variance
xvar[is.na(xvar)]-mean(xvar,na.rm=TRUE)

# get the N per species
n-as.vector(table(names(y)))

# compute the standard errors
se-sqrt(xvar/n)

# compute K
K-phylosig(tree,xbar,se=se)

Hopefully that works for you.

All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 6/6/2012 6:12 PM, Jonathan Benstead wrote:

Dear r-sig-phylo members - Colleagues and I came across an old post about 
calculating the Blomberg K statistic in studies where there is more than one 
data point for some species, by using a tree that includes zero-branch lengths 
for those multiple studies. This seems ideal for our data, which has several 
species with up to 14 data points for trait values. We have prepared a tree 
with the necessary zero-branch lengths and our tip labels match our row labels. 
However, when I run the code I get the following error message:

Error in Initialize.corSymm(X[[1L]], ...) :
   Initial values for corSymm must be between -1 and 1

I found two references to this problem among old posts, but nothing that helped 
solve the problem, except that it might be something to do with the tree. As 
someone else has pointed out, the problem goes away if we change the zero 
branch lengths to a positive number, but we don't want to give these branches 
an arbitrary length. Is there another way to do this?

Here's our code:

library(ape)
library(picante)
Fish = read.table(Fish.csv, header = TRUE, sep = ,)
AuthP2-as.matrix(as.matrix(Fish)[,1])
tree1 = read.nexus(consensus_constraint_polytomies.nex)
tree2 = drop.tip(tree1, Erpetoichthys_calabaricus_AY442348_)
names(AuthP2)-tree2$tip.label
Kcalc(AuthP2[tree2$tip.label], tree2)

Any help getting this analysis to work would be greatly appreciated.

Jon


Jon Benstead
Associate Professor
Aquatic Biology Program
University of Alabama, Box 870206
1124 Bevill Building, 201 7th Ave.
Tuscaloosa, Alabama 35487-0206

Lab homepage: http://bama.ua.edu/~jbenstead
Iceland blog: http://icelandstreams.blogspot.com/

Tel: 205-348-9034
Fax: 205-348-1403

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Error estimating RMA in phytools

2012-06-22 Thread Liam J. Revell


Hi Oscar.

My educated guess after some investigating is this is because the 
objects Chem2mean and Morpho2mean are data frames not matrices; and thus 
Chem2mean[,1] does not contain names. Try first doing: 
Chem2mean-as.matrix(Chem2mean), and the same with Morpho2mean, and then 
repeat the analysis.


You might also get this error (or one quite similar) if you tree has 
zero-length terminal edges or y=k*x for non-zero scalar constant k 
(i.e., r(x,y)==1 or -1).


All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 6/22/2012 12:17 PM, Oscar Valverde wrote:

Hi

I am working on the correlation among traits of 36 plants. I am using
the function phyl.RMA to estimate relationship between pairs of traits
but I keep having a error message

CNvrSRLphylo-phyl.RMA(Chem2mean[,1],Morpho2mean[,1],treeARP,method=lambda)
Error in solve.default(kronecker(R, C)) :
   system is computationally singular: reciprocal condition number = 0

I tried running the code provided by phytools for this function and
works fine. So it must be something about my tree that is not working.
Thanks in advance for the help



On Fri, Apr 20, 2012 at 12:19 PM, Jarrod Hadfieldj.hadfi...@ed.ac.uk  wrote:

Hi,

The warning:

1: glm.fit: algorithm did not converge

is from a standard glm that MCMCglmm uses to obtain semi-reasonable starting
values. For a three (I think this is correct for your data?) category
response the starting values are obtained from 2 binomial glms (the presence
of category 2 and the presence of category 3).  The fact that a simple glm
did not converge indicates a problem. You seem to suggest the warning only
appears with phylogenetic models, which seems odd. Is this correct? If so,
try the argument start=list(QUASI=FALSE) in MCMCglmm and report back.

If not, I would try and diagnose the problem using a simple binomial glm for
each of the two categories in order to pinpoint the problem. The most likely
problem is complete separation whereby some categories are not represented
in some environments. Is this possible?

With completely separated data MCMCglmm (and other software) is likely to
give spurious results unless something is done. One possibility is to fit a
more informative prior on the fixed effects which is approximately flat on
the probability scale (See the last half of section 2.6 in the CourseNotes).

Cheers,

Jarrod






Quoting Rafael Rubio de Casasr...@nescent.org  on Tue, 17 Apr 2012
17:15:17 -0400:


Dear R-Sig-Phylo users,
I am trying to use MCMCglmm to analyze the evolution of a discrete
character and the influence of the environment on it, but I keep on
having problems. All my mixed models give crazy results, with chains
that do not converge. I always get teh following error message, which  I
believe is probably at the root of the problem
Warning messages:
1: glm.fit: algorithm did not converge

However, when I fit a fixed effects model (no phylogenetic correction),
everything seems to go fine, and it is when I incorporate the phylogeny
that things fall apart. I keep on getting the warning message,
regardless of the priors (although there might be something to improve
there). I was hoping somebody could give some advice.
So here is a detailed description of what I am doing:
My data consists of a trait with three levels, a set of environmental
parameters and a phylogeny that describes the evolutionary relationships
among the different taxa.
Let the data be stored in a dataframe df, including taxa names as
taxa, the trait of interest as a factor with three states char, and
the environmental parameter a continuous variable env. The phylogeny
is an ultrametric tree phy. Below I include a simulation that creates
a similar dataset but that for some reason is not problematic. My actual
dataset is very big and somewhat messy. If someone is interested I can
send it.
The code I'm using is as follows
#Fixed effects model:
#Prior definition
k- length(levels(data$char))
I- diag( k-1 )
J- matrix( rep(1, (k-1)^2), c(k-1, k-1))
IJ- (1/3) * (diag(k-1) + J)
prior = list(R = list(V = IJ, fix = 1))
myMCMC- MCMCglmm( char ~ trait + env:trait -1,
rcov = ~us(trait):units,
data = df, family=categorical,
prior=prior )

#Model using the phylogeny as a random factor
#Prior definition . (There might be some problems here...)
Prior.phyl = list(R = list(V = IJ, fix = 1),G = list( G1 = list(V = IJ,
n = k-1 ) ) )
Ainv-inverseA(phy)$Ainv
df$species=df$taxa
myMCMC.phyl- MCMCglmm( ch ~ trait + env:trait -1,
 random=~us(trait):species,
rcov = ~us(trait):units,
ginverse = list(species=Ainv),
data = df, family=categorical,
prior=prior.phyl)

Thanks in advance,
Rafa


### Simulated data

Re: [R-sig-phylo] Phylogenetic Canonical Correlations

2012-07-13 Thread Liam J. Revell


Hi Jon.

phytools (http://cran.r-project.org/package=phytools) has a function, 
phyl.cca, for phylogenetic canonical correlations. It returns the 
coefficients  scores (in the species space - so, with phylogenetic 
correlation), correlations, and p-values.


You can also just use pic and cancor to get the same coefficients, as 
follows:


# simulate data
tree-rtree(50)
X-replicate(5,fastBM(tree))
Y-replicate(6,fastBM(tree))
# use phyl.cca
result1-phyl.cca(tree,X,Y)
# use pic  cancor
picX-apply(X,2,pic,phy=tree)
picY-apply(Y,2,pic,phy=tree)
result2-cancor(picX,picY,xcenter=F,ycenter=F)

Note that the coefficients from result1 and result2 may not be equal, 
but scale by a constant (this will not change the correlation between 
corresponding canonical variables).


# e.g.
result1$xcoef/result2$xcoef
result1$ycoef/result2$ycoef[,1:5]
# should produce matrices in which each column of each matrix
# is a scalar constant

- Liam

--
Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com


On 7/13/2012 1:32 PM, Jonathan Mitchell wrote:

Hello all,

I'm trying to perform a canonical correlations analysis that takes
phylogenetic non-independence into account properly (a la what Revall 2009
did for PCA). To this effect, I've written a script to do canonical
correlates in R based entirely on SVD instead of QR decomp. (because I
understand SVD better) that recapitulates the cancor() function (
http://nr.com/whp/notes/CanonCorrBySVD.pdf).

The way I'm currently approaching this is, I'm centering each column of X
and Y on their phylogenetic mean instead of their numeric mean, and
multiplying the centered matrices by the inverse of the vcv. Is this
appropriate/correct? Is there a better way? Simplified example code posted
below:

tree - rtree(10)
X - replicate(5, rnorm(10))
Y - replicate(6, rnorm(10))
rownames(X) - rownames(Y) - tree$tip.label

C - vcv(tree)
inC - solve(C)
Ones - rep(1, nrow(X))

PMx - t(solve(t(Ones) %*% inC %*% Ones) %*% t(Ones) %*%inC %*% X)
PMy - t(solve(t(Ones) %*% inC %*% Ones) %*% t(Ones) %*%inC %*% Y)

pcX - invC %*% (X - Ones %*% t(PMx))
pcY - invC %*% (Y - Ones %*% t(PMy))

#SVD of X, Y and Ux'Uy
Xs - svd(pcX)
Ys - svd(pcY)

uTu - t(Xs$u) %*% Ys$u
uTus - svd(uTu)

#Canonical coefficients for X
A - Xs$v %*% solve(diag(Xs$d)) %*% uTus$u

#Canonical coefficients for Y
B - Ys$v %*% solve(diag(Ys$d)) %*% uTus$v

-- Jon

_

Jonathan S. Mitchell
http://home.uchicago.edu/~mitchelljs/

PhD Student
Committee on Evolutionary Biology
The University of Chicago
1025 57th Str, Culver Hall 402
Chicago, IL 60637

Geology Department
The Field Museum of Natural History
1400 S. Lake Shore Dr.
Chicago, IL 60605

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Within-species error in PGLS (or similar)

2012-07-17 Thread Liam J. Revell


Hi Matt, David, and Jay.

Yes, the general approach described in Ives et al. (2007; Syst. Biol.) 
is implemented for different univariate tests and methods, including 
model fitting, in phytools (as well as in geiger); however the 
regression model of Ives et al. has not yet been implemented in R, so 
far as I know.


I have made an attempt to write a function for this method, pgls.Ives, 
and it is now posted to the phytools page (link to code: 
http://faculty.umb.edu/liam.revell/phytools/pgls.Ives/v0.1/pgls.Ives.R). 
This version only allows bivariate (y=b0+b1*x+e) regression and a simple 
Brownian error structure, but does incorporate sampling error in x  y 
following Ives et al. If the reported sampling error is zero, then the 
function performs standard PGLS  yields the same estimators as 
gls(...,correlation=corBrownian(...)).


Note that I just did this today and it has been subjected to very 
limited testing and it may contain bugs or errors as the math of Ives et 
al. was not completely straightforward to me in a couple of places (here 
I tried to use common sense).


 source(pgls.Ives.R)
 args(pgls.Ives)
function (tree, X, y, Vx, Vy, Cxy)

tree is a phylo object; X is a *vector* containing x, with names(X) 
containing the tree tip labels; y is a vector of y with names(y) as 
before; Vx, Vy, Cxy are vectors with the sampling variances (squared 
standard errors) or covariances for X, y, and between X and y with 
names(...) as before.


If anyone would like to compare to Ives  Garland's MATLAB code I would 
love to hear the result as I only hope I got this right.


All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com


On 7/16/2012 7:24 PM, Matt Pennell wrote:

David and Jay,

I am not sure if Liam and Graham's was designed for phylogenetic regression
per se. Tony Ives has done exactly this but his code is only in Matlab as
far as I understand.

An alternative would be to use RMA approach for this which Liam has created
(but note Simon Blomberg's warning about this in the comments section):

http://phytools.blogspot.com/2011/04/phylogenetic-rma-regression.html

Another (possibly less problematic) option is to use Joe Felsenstein's
approach which is in the current implementation of the progarm contrast in
the phylip package. This allows one to do independent contrasts on traits
with multiple measurements at the species level.

Comparative Methods with Sampling Error and Withinâ€�Species Variation:
Contrasts Revisited and Revised.
Joseph Felsenstein
The American Naturalist , Vol. 171, No. 6 (June 2008), pp. 713-725

Personally I would do this with the R package MCMCglmm as it allows for
greater flexibility in the distribution of the tip values but it is perhaps
more burdensome to run.

HADFIELD, J. D. and NAKAGAWA, S. (2010), General quantitative genetic
methods for comparative biology: phylogenies, taxonomies and multi-trait
models for continuous and categorical characters. Journal of Evolutionary
Biology, 23: 494â€“508. doi: 10./j.1420-9101.2009.01915.x

hope this helps.

cheers,
matt

On Mon, Jul 16, 2012 at 12:42 PM, Jay Fitzsimmons
jayfitzsimm...@gmail.comwrote:


Hello David,

There is at least one package in R that incorporates intraspecific
variation within a phylogenetic context.

phytools.  The method is described in:
Revell, L. J. and R. G. Reynolds. In Press. A new Bayesian method for
fitting evolutionary models to comparative data with intraspecific
variation. Evolution.

http://onlinelibrary.wiley.com/doi/10./j.1558-5646.2012.01645.x/abstract

If others know better than me they should say so because while I plan
to use phytools in the next month for the exact purpose David Winter
describes, I haven't yet done so.

Jay

PS: thank you to everyone on this list - I am teaching myself R and
have found the posts very helpful.

On Wed, Jul 11, 2012 at 11:23 PM, David Winter david.win...@gmail.com
wrote:

Hi all,

Sorry if this has been dealt with here before, I searched the list but
couldn't find a thread quite the same as this.

Are there methods in R for performing phylogenetic regression on
datasets with many measurements per tip. We have data from many
individuals that includes a measurement and several ecological
variables that might be expected to predict that measurement. We also
have a species-level phylogeny for these individuals, and we can
happily assign each data point to a tip in this phylogeny.

My first inclination was to crudely incorporate the evolutionary
history of these individuals into our models by making a big polytomy
for each species, but reading Ives et al 2007
(dx.doi.org/10.1080/10635150701313830) it seems there is a model of
phylogenetic regression that include within-tip variation? Does
anyone know if this is implemented in R?

It's possible I'm entirely on the wrong track here, in which case I'm
happy

Re: [R-sig-phylo] simulate traits evolution in correlated with body mass

2012-07-24 Thread Liam J. Revell


Hi Andrew.

For desired correlation r, size data x, and tree you could just do:

library(phytools)
y-r*x+sqrt(1-r^2)*fastBM(tree)

This should give you the correlation r on average and the y|x should be 
Brownian. (If x is Brownian, then both y  x  y|x will be too.)


- Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 7/24/2012 2:25 PM, Andrew Barr wrote:

Hello everyone,

I have a tree, with body mass data for the tip taxa.  I am interested
in simulating the evolution of continuous traits by BM that are
correlated with body mass. I want to use the body mass tip data that I
have to inform the character simulation, so that my resulting
simulated tip values are highly correlated with my actual body mass
data.

Can anyone suggest an appropriate way to simulate this?  I know I can
simulate correlated characters with geiger::sim.char(), but I don't
know how I could include my actually body mass data using this method.

Thanks very much,

W. Andrew Barr
PhD Candidate
Dept of Anthropology
University of Texas at Austin
www.ancienteco.com

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] simulate traits evolution in correlated with body mass

2012-07-24 Thread Liam J. Revell

Actually, modify my previous email. That only works for sig^2(x)=1.0. It 
should be:


library(phytools)
y-r*x+sqrt(1-r^2)*fastBM(tree,sig2=mean(pic(x,tree)^2))

- Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 7/24/2012 4:47 PM, Liam J. Revell wrote:

Hi Andrew.

For desired correlation r, size data x, and tree you could just do:

library(phytools)
y-r*x+sqrt(1-r^2)*fastBM(tree)

This should give you the correlation r on average and the y|x should be
Brownian. (If x is Brownian, then both y  x  y|x will be too.)

- Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 7/24/2012 2:25 PM, Andrew Barr wrote:

Hello everyone,

I have a tree, with body mass data for the tip taxa.  I am interested
in simulating the evolution of continuous traits by BM that are
correlated with body mass. I want to use the body mass tip data that I
have to inform the character simulation, so that my resulting
simulated tip values are highly correlated with my actual body mass
data.

Can anyone suggest an appropriate way to simulate this?  I know I can
simulate correlated characters with geiger::sim.char(), but I don't
know how I could include my actually body mass data using this method.

Thanks very much,

W. Andrew Barr
PhD Candidate
Dept of Anthropology
University of Texas at Austin
www.ancienteco.com

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] find mrca of multiple taxa.

2012-08-15 Thread Liam J. Revell

Hi Ben. 

Take a look at the function findMRCA in my package, phytools. I think this does 
what you want.

Good luck. Liam

Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

-Original Message-
From: Ben Weinstein
Sent: Wednesday, August 15, 2012 9:38 AM
To: r-sig-phylo@r-project.org
Cc: Shea Lambert; Antonin Machac
Subject: [R-sig-phylo] find mrca of multiple taxa.

Hello all,

I'm trying to replace a clade of tips with a single tip, placed at the mrca
of all taxa clade (similar Q to here
https://stat.ethz.ch/pipermail/r-sig-phylo/2012-March/001895.html). Before
i bind tree, I just need to a nice way to identify which node encompasses
all descendants. I see many packages and functions, but i can't quite get
the result i want. Forgive me if i've just missed this function somewhere,
it seems hard to believe it hasn't been written explicitly.

Clearly the package phylobase knows which node i want, when i subset by the
tips.include

library(phylobase)

data(geospiza)
nodeLabels(geospiza) - paste(N, nodeId(geospiza, internal), sep=)
geotree - extractTree(geospiza)

## subset examples
tips - c(difficilis, fortis, fuliginosa, fusca, olivacea,
pallida, parvulus, scandens)
subset(geotree, tips.include=tips)

 How can i get it to return which node it used, numbered from the
original tree?

My other attempt was to use expand.grid and mrca through an sapply
command to brute force walk through every pair of taxa, and then take
the maximum node number (since its farthest up the tree). This seems
inelegant, and i'm not convinced it works 100% of the time.

Thanks for your input.

Ben Weinstein






-- 
Ben Weinstein
Graduate Student
Ecology and Evolution
Stony Brook University

http://life.bio.sunysb.edu/~bweinste/index.html

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Plot size

2012-08-28 Thread Liam J. Revell


Hi Gwennael.

I'm not totally sure I understand the problem, but this might help.

You can use windows (in Windows) or dev.new to creating a plotting 
window with you desired dimensions. plot.phylo will then automatically 
rescale the branch lengths to fit. You can also control the font size 
from within plot.phylo.


For example:

library(ape)
dev.new(width=2,height=10)
plot(tree,cex=0.2,no.margin=T)
savePlot(test1.pdf,type=pdf)

# or, using phytools plotTree, which gets rid of some of the excess
# whitespace
library(phytools)
dev.new(width=2,height=10)
tree-compute.brlen(tree) # since your tree has no edge lengths
plotTree(tree,fsize=0.2,ftype=i,lwd=0.4,pts=F)
savePlot(test2.pdf,type=pdf)

In addition, in an interactive session the tree (but not the labels) 
will automatically resize when you manually resize the plotting window.


Is this at all helpful?

All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 8/28/2012 1:34 PM, Gwennaël Bataille wrote:

Dear R-sig-phylo users,

I would like to rescale a large tree, but I am not an expert of graphs
in R...

My labels overlap with each other. And I would like to find an option to
put more space between each branch, but could not find one.

So I tried to resize the labels (playing with cex parameter), but now
the branches are far too long in comparison with the labels... So I
would like to reduce also branch lengths :

- I can define new values for edge.length but I want my taxa to be
aligned. So, I prefer use.edge.length=F

- I also thought about playing with x.lim parameter, but my graph is
then truncated rather than resized...

Any help would be appreciated.

Thanks a lot in advance,

Gwennaël ( gwennael.batai...@uclouvain.be
mailto:gwennael.batai...@uclouvain.be )

PS: Here are the script and the data :

#Question r-sig-phylo

library(ape)

t -
(1,2,3),(18,19,20,21,22,23,24,25,26,27,28,29,30,31)),(4,5),(6,7,8),(9,10,11,12,13,14),(15,16,17),((32,33),34,((35,36,37),38,39))),40,42),41),(43,(44,45,46),47,((48,49),50))),51,52,53,54),(((55,((56,(57,58)),(59,60),61)),((62,63),64,65)),((66,67,68,69,70,71,72),73,(74,75,76,77,((78,(80,82)),(84,87)),((79,(81,83)),89)),(85,86)),88),(90,91,92))),((93,((94,95),((96,97,98,99,100,101,102,103,104,105,106),(118,(119,120),121,122,(123,124,125,126,127,128),129,(((130,131,132,133,134,135),136,138,139,((141,142),((143,144),(145,(146,147),137,140,(148,149,150,151),(152,153,154,155,156,157,158,159,160),((161,(164,165,166),((172,((173,177),(178,183)),179,180,181,182,184),174,175,176,185,186,187,188)),(167,168,169,170,171)),162,163))),((107,108),109,((110,111,112,113,114,115),(116,117),(((189,(((256,257,258,259,260,261,262),263,279),((268,269,271),270,272,(273,276),274,275,277,278)),(264,265,266,267)),(280,281,282,283,284,285),((286,287,288,289),(291,292!

),(293
,294)),290)),342,343),(344,345,346)),349),(347,348))),295,((296,301),297,298,299,300,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,(((324,325,327,328),326),329),330,331,332,333,334,335,336,337,338,339,340,341)),(350,351,352,353,354,355,356,357,358,359,360,361,362,363)),364),(365,366),(367,368,369),370)),(((191,192,193),(((199,201),200),(202,(207,208))),(203,204)),(205,206)),209),((236,237),(242,(((243,247),244),(245,(246,248)),249))),240),238,239),241),250,253),255),254),(251,252),(((194,195,196),(197,198)),(210,211),212),213,(214,(215,216))),(228,(229,230))),((217,((218,221),(219,220),222,223,224,225,226,227)),231,234),233),232),235)),190));


tree - read.tree(text=t)

plot(tree)

#Plot with reduced size of the labels

windows(width=15, height=10, pointsize=10, rescale=fixed)

par(xpd=NA, cex = 0.3) #Reduce size of the labels

plot(tree, y.lim=c(0,400))

#Play with x.lim

plot(tree, y.lim=c(0,400), x.lim=200)

#Define new edge length

tree2 - tree$edge.length - rep.int(5,544)

plot(tree2, y.lim=c(0,400), x.lim=200, use.edge.length=T)


[[alternative HTML version deleted]]



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Alter the branch lengths of a Tree

2012-08-28 Thread Liam J. Revell


Hi Fernando.

I'm not entirely sure what you're going for, but let me try to be of 
some help.


The tree is stored in memory as what is called a list - basically, a 
set of objects that can be of different types. The branch lengths of the 
tree are stored in the list element edge.length. So, for a phylo 
object with the variable name tree, you can access the branch lengths of 
the tree using tree$edge.length. (If your tree does not have branch 
lengths, this will be NULL.)


Consequently, you can assign any arbitrary values to the branch lengths 
of the tree by assigning values to tree$edge.length. For instance, to 
set all branch lengths to 1.0, you would just do:


tree$edge.length-rep(1,nrow(tree$edge))

The order of tree$edge.length corresponds to the row order of tree$edge, 
so if you want to see which branches in tree$edge.length correspond to 
which branches on the tree, you can do something like the following 
(here, I use a simple random tree for illustration):


set.seed(1)
tree-rtree(10); tree$edge.length-rep(1,nrow(tree$edge))
X-tree$edge
X[X[,2]%in%1:length(tree$tip),2]-
   tree$tip[X[X[,2]%in%1:length(tree$tip),2]]
names(tree$edge.length)-paste(X[,1],X[,2],sep=,)
plot(tree); nodelabels()
tree$edge.length

Now if, for instance, I want to make the branch length leading from node 
13 to tip taxon t6 to (for instance) 3.0, I could just do:


tree$edge.length[13,t6]-3
plot(tree); nodelabels()

Of course, if we have some simple criterion on which we want to change 
our branch lengths, we can do that to. For instance, if we wanted to set 
all branch lengths with length 1 to 1.0 and all branch lengths with 
length 1 to 2.0, we could do it as follows:


tree$edge.length-runif(n=nrow(tree$edge),min=0,max=2)
tree$edge.length
tree$edge.length[tree$edge.length1]-1
tree$edge.length[tree$edge.length1]-2
tree$edge.length
plot(tree)

I hope this is of some help.

All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 8/28/2012 4:15 PM, Fernando Sobral wrote:

Hi,

I am trying to alter the branch lengths of a phylogeny so that the smaller
branch has size igual 1 and the larger branch is the proportional sum of
the smaller branchs. For example, if the larger branch is equivalent to ten
smaller branchs, it should have size igual to 10. I already tryed use some
functions like compute.brlen(), compute.brtime() and rescaleTree(), but did
not work.

Any help would be greatly appreciated.

Thanks,

Fernando L. Sobral

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] tips to root brach length

2012-09-03 Thread Liam J. Revell

Hi. Try:

diag(vcv.phylo(tree))

nodeHeights in phytools will also give you a matrix with all the heights of 
internal and terminal nodes in the same order as tree$edge.

- Liam 

Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

-Original Message-
From: boyang zhe
Sent: Monday, September 03, 2012 5:17 AM
To: r-sig-phylo@r-project.org
Subject: [R-sig-phylo] tips to root brach length

Hi, everyone

I am new to ape and R programming. I have an unrooted tree. I use
root() function to reroot the tree by one outgroup. and then I try to
extract root to tip distance. the tree$edge.length only store the
brach length between each tip and corresponding internal node. So how
I extract the distance between root and the tips?

Thanks first!

Boyang

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] variation in rates over time

2012-09-17 Thread Liam J. Revell

Hi Jason. Matt is absolutely correct. You can do this with phytools. 
Say, for instance, you have an ultrametric phylogeny with branches in 
millions of years (tree) and data vector containing the trait values for 
species (x) and you want to test the hypothesis that the last 3.4 my has 
a different rate of evolution than the rest of the tree, you could do 
this as follows:


library(phytools) # load phytools
tree-make.era.map(tree,c(0,max(nodeHeights(tree))-3.4))
plotSimmap(tree,pts=F,lwd=3) # visualize
fit-brownie.lite(tree,x) # fit model

That's it. Good luck. Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 9/17/2012 7:46 PM, Matt Pennell wrote:

Jason,

I think the best way to do this is with the approach of O'Meara et al. 2006
Evolution Brownie.

Liam Revell has implemented this in R in his package phytools. You can
modify the steps taken in this tutorial here
http://phytools.blogspot.com/2011/07/running-brownielite-for-arbitrarily.html
perhaps
in conjunction with the function make.era.map()
http://phytools.blogspot.com/2011/10/new-function-to-plot-eras-on-tree.html
though
I admittedly have not tried this myself.

Perhaps Liam or someone else has a better explanation but I hope this is at
least somewhat helpful.

cheers,
matt

On Mon, Sep 17, 2012 at 7:33 PM, Jason S jas2...@yahoo.com wrote:




Hello,

I see that there are several interesting alternatives to test for
different rates among clades. However, I was wondering if there is a method
to test for varying rates over time. I'm aware of Pagel's delta and the EB
model, but I was thinking more in terms of testing if there is a different
rate for the entire tree after a specified point in time. For instance, if
a snail predator colonizes an island 3.4 Mya, is there evidence for an
increased rate of evolution in the prey after that point in time? Something
like two lambdas, one for before and one for after that point in time.

Thanks!

Jason
 [[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] variation in rates over time, unexpected message when using Brownie.lite

2012-09-18 Thread Liam J. Revell


Hi Agus.

It looks like in your example the Hessian matrix of partial second 
derivatives (the curvature) at the optimum is numerically singular. I'm 
not sure what this means - it could be that the surface is very flat, or 
alternatively that it is highly correlated.


Is the error fatal, or do you get a result? The Hessian matrix is only 
used to estimate standard errors of the fitted parameter values, so if 
the option to compute the variances from the Hessian could be turned off 
(not possible in the present version), presumably the analysis would run 
through.


Another option is to use brownieREML in phytools or, as Brian pointed 
out yesterday, the OUwie package. brownieREML is a lighter REML 
version of brownie.lite, which does not compute the Hessian or the 
standard errors on parameter estimates. OUwie is a separate package that 
uses phytools mapped trees to do the analyses of brownie.lite in one of 
its functions.


All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 9/18/2012 4:08 PM, Agus Camacho wrote:

Dear all, I tried to test something similar to Jason, but relating the
change in rates to the acquisition of a trait, instead to a specific date.

For that, I used a phylogeny with ten taxa, without outgroup.
I used exactly the same script gently proposed by Liam but got the
following error message:
Error in solve.default(model1$hessian) :Lapack routine dgesv: system is
exactly singular

I've been looking through the internet but was not able to identify what
causes this error. Any hint about that?

BTW, anybody knows how to directly answer a message found in the digest of
the mail list, without leaving the thread?

Below, I pasted the complete thread dealing with this topic.

Cheers,
Agus

Message: 9
Date: Tue, 18 Sep 2012 00:13:00 -0400
From: Brian O'Meara bome...@utk.edu
To: Jason S jas2...@yahoo.com
Cc: R-sig-phylo@r-project.org R-sig-phylo@r-project.org
Subject: Re: [R-sig-phylo] variation in rates over time
Message-ID:
 cakywhkq-jjroqzdvfutphi4xopxam+aaacu64nldb3inxoz...@mail.gmail.com
Content-Type: text/plain

I agree with the suggestions so far. I just wanted to point out a few more
alternatives:

You could use the geiger package to estimate the best scaling for the
tworatetree transformation to do this (should be equivalent to the earlier
solutions, though it would require running optimization).

You could also use the OUwie package with a tree that has been given simmap
mappings using phytools. The advantage of this is that you could evaluate
Brownian models but you could also look at various Ornstein-Uhlenbeck
models (though note that while there is information about different
Brownian rates before and after a time slice, info about different alphas
(strength of pull parameter) and thetas (the attractor, aka optimal
value) is rapidly (but not immediately) lost under an OU process).

For completeness, especially for citations, note that O'Meara et al. (2006)
and Thomas et al. (2006) independently arrived at essentially the same
method, so it is worth reading both papers.

Best,
Brian

___
Brian O'Meara
Assistant Professor
Dept. of Ecology  Evolutionary Biology
U. of Tennessee, Knoxville
http://www.brianomeara.info

Students wanted: Applications due Dec. 15, annually
Postdoc collaborators wanted: Check NIMBioS' website
Calendar: http://www.brianomeara.info/calendars/omeara


On Mon, Sep 17, 2012 at 9:11 PM, Jason S jas2...@yahoo.com wrote:




Thanks, guys. That's exactly what I needed.



  From: Liam J. Revell liam.rev...@umb.edu
To: Matt Pennell mwpenn...@gmail.com

-project.org
Sent: Monday, September 17, 2012 9:22 PM
Subject: Re: [R-sig-phylo] variation in rates over time

Hi Jason. Matt is absolutely correct. You can do this with phytools.
Say, for instance, you have an ultrametric phylogeny with branches in
millions of years (tree) and data vector containing the trait values for
species (x) and you want to test the hypothesis that the last 3.4 my has
a different rate of evolution than the rest of the tree, you could do
this as follows:

library(phytools) # load phytools
tree-make.era.map(tree,c(0,max(nodeHeights(tree))-3.4))
plotSimmap(tree,pts=F,lwd=3) # visualize
fit-brownie.lite(tree,x) # fit model

That's it. Good luck. Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 9/17/2012 7:46 PM, Matt Pennell wrote:



  Jason,


I think the best way to do this is with the approach of O'Meara et al.

2006

Evolution Brownie.

Liam Revell has implemented this in R in his package phytools. You can
modify the steps taken in this tutorial here




http://phytools.blogspot.com/2011/07/running-brownielite

Re: [R-sig-phylo] phy.manova from Geiger package gives an error of different variable length with variables of equal length

2012-09-25 Thread Liam J. Revell


Hi Agus.

I think the problem is that c() concatenates your data for the 
continuous dependent variables in your model into one long vector. Try 
instead:


table-as.matrix(table)
Y-cbind(table[,2],table[,3])
x-as.factor(table[,1])
phy.manova(tree,Y,x)

Good luck. - Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 9/25/2012 11:54 AM, Agus Camacho wrote:

Hope not being abusing of this list.

Possibly, I missed something. I tried both with matrix and dataframe and
reached to the same problem:


phy.manova(tree, c(table$option,table$difoption), table$morph, data.names=NULL, 
nsim=1000, test=Wilks)Warning: no tip labels, order assumed to be the same as 
in the treeError in model.frame.default(formula = as.matrix(td$data) ~ group, 
drop.unused.levels = TRUE)




phy.manova(tree, c(table[,option],table[,difoption]), table[,morph], 
data.names=NULL, nsim=1000, test=Wilks)Error in model.frame.default(formula = as.matrix(td$data) ~ group, 
drop.unused.levels = TRUE) :

   variable lengths differ (found for 'group') length(table[,morph])
  [1] 10 length(table[,option])[1] 10 length(table[,difoption])
[1] 10


theregoes the data:


   morph   option  difoption
C._leiolepis  2 2.916667 -0.167
C._nicterus   2 2.354167 -0.208
C._sinebrachiatus 2 2.464286  0.2857143
S._catimbau   2 2.411765  0.2352941
N._ablephara  2 2.795455  0.1363636
P._erythrocercus  1 1.545455  1.0909091
P._tetradactylus  1 1.74  1.080
V._rubricauda 1 1.462500  0.725
M._maximiliani1 1.269231  0.3846154
P._paeminosus 1 1.307692  0.6153846


Morph is of class factor.

For the developers, thanks for making all these analyses possible.
Cheers,
Agus



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] breaking up phylogeny figure

2012-11-14 Thread Liam J. Revell


Hi Ben.

You've probably found a way to do this already (since I see that your 
original message was sent in early October), most likely outside of R, 
but I just developed and posted an R function for this which I will add 
to the next update of phytools. I have described this function, along 
with links to the code, on my blog: 
http://blog.phytools.org/2012/11/function-to-break-up-plotted-tree-into.html


All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 10/3/2012 12:45 PM, Ben Weinstein wrote:

Hi all,

I'm wondering if there is a way to plot a phylogeny in R that is broken up
and displayed side by side to condense space. I've made a backbone in R
that i'd like to make into a figure, but it is too tall. If this isn't the
write setup, could someone suggest what program (and format for me to
write.tree) from R. I am using windows, in terms of potential third part
programs.

Thanks,

ben



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Phylogenetic signal and PGLS

2012-11-28 Thread Liam J. Revell


Hi Antigoni.

The other responders are correct, but just to be a little more concrete:

# using ape  nlme
library(ape); library(nlme)
# assuming your data are contained in named vectors x  y
y-y[names(x)]
fit-gls(y~x,data.frame(x,y),correlation=corPagel(1,tree,fixed=FALSE),method=ML)
# assuming your data are in data frame X, with column names var1,
# var2, etc.; for example:
fit-gls(var1~var2+var3,X,correlation=corPagel(1,tree,fixed=FALSE),method=ML)

# using caper
library(caper)
# assuming your data are in named vectors x  y
y-y[names(x)]
X-data.frame(x,y,Species=names(x))
fit-pgls(y~x,comparative.data(tree,X,Species),lambda=ML)

# using phytools
library(phytools)
# assuming your data are in named vectors x  y
fit-phyl.resid(tree,x,y,method=lambda)

Although I am the author of phytools, I would say that the first two 
options should be preferred in general because they can be used with 
generic functions such as summary  residuals. (With the *possible* 
exception that phyl.resid seems to be a little bit faster on large 
trees. pgls in caper, for instance, took more than 4 min. to run a tree 
of 1000 tips on my computer.)


I hope this helps.

All the best, Liam


Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 11/28/2012 12:02 PM, Antigoni Kaliontzopoulou wrote:

Hello everyone,

I am trying to implement Liam Revell's suggestion on the evaluation of
Pagel's lambda simultaneous to fitting PGLS to minimize the effects of
wrong model selection (OLS vs. PGLS) on species data (i.e. Revell 2010,
Methods in Ecology and Evolution 1: 319-329).

Does anyone know if this kind of simultaneous optimization of lambda and
PGLS parameters is presently implemented in R? The obvious place would
be phytools, but I could not find anything similar in that package.

I would be grateful if you can provide any suggestions.

Antigoni



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] truncated downwards plot.phylo and plot margin

2013-03-05 Thread Liam J. Revell


Hi Zech.

Regarding point (1), this is indeed a bug in plot.phylo, however there 
is a work-around. Try the following:


library(ape)
data(bird.orders)
plot(bird.orders,direction=downwards)
axis(2)
plot(bird.orders,direction=downwards,y.lim=c(-15,30))

Regarding point (2), plot.phylo(...,no.margin=TRUE) *does* plot without 
margins; however there is some space within a plotted graph between the 
(invisible) plot axes and plotted points and lines. If we want our edges 
to butt right up against the plot window, we can also accomplish this 
with the arguments y.lim and x.lim. For instance, I found via trial  
error that:


plot(bird.orders,x.lim=c(1.3,35.5),y.lim=c(1.5,22.4),no.margin=T)

got me pretty darn close.

All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 3/5/2013 2:32 AM, Not To Miss wrote:

Hi,

When plotting the phylo tree downwards, the tip labels are truncated:

library(ape)
data(bird.orders)
plot(bird.orders, direction='downwards')


I found the problem on the website:
http://www.mail-archive.com/r-sig-phylo@r-project.org/msg00298.html. It
seems still not solved? It's kind of annoying.

Another annoying thing is that, when setting the margins to zero, the
plot.phylo still plot some margins around the tree, which is totally
inconsistent to the general plotting convention. Is it possible to be
changed?

Thanks,
Zech

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] testing for correlates of rates of evolution

2013-03-11 Thread Liam J. Revell


Hi John  Matt.

What about the admittedly ad hoc approach of computing the correlation 
between the states at ancestral nodes for x  the squared contrasts for 
corresponding nodes for y? Then you can generate a null distribution for 
the test statistic (say, a Pearson or Spearman rank correlation) by 
simulation. This seems to give reasonable type I error when the null is 
correct, and when I simulate under the alternative (i.e., the rate of 
Brownian evolution along a branch depends on the state at the 
originating node) it sometimes is significant.


Here's a function that does what I've described (I think - please check 
it carefully!). It needs phytools and all dependencies.


ratebystate-function(tree,x,y,nsim=100,method=c(pearson,spearman)){
   method-method[1]
   if(!is.binary.tree(tree)) tree-multi2di(tree)
   V-phyl.vcv(cbind(x,y),vcv(tree),lambda=1)$R
   a-fastAnc(tree,x)
   b-pic(y,tree)[names(a)]^2
   r-cor(a,b,method=method)
   beta-setNames(lm(b~a)$coefficients[2],NULL)
   foo-function(tree,V){
  XY-sim.corrs(tree,V)
  a-fastAnc(tree,XY[,1])
  b-pic(XY[,2],tree)[names(a)]^2
  r-cor(a,b,method=method)
  return(r)
   }
   r.null-c(r,replicate(nsim-1,foo(tree,V)))
   P-mean(abs(r.null)=abs(r))
   return(list(beta=beta,r=r,P=P,method=method))
}

Perhaps this is a good idea. I don't know. All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 3/11/2013 4:03 PM, Matt Pennell wrote:

John,

This is a tricky question. If your independent variables were discrete, you
could use a stochastic character mapping approach to map state regimes
onto your tree and ask whether the regimes had different rates using a
model selection approach. (This could be done with the R packages phytools
or ouwie, depending on what models of trait evolution you are interested in
investigating).

However, since your independent variables are continuous, there is no
equivalent of the stochastic mapping approach to answer this question. As
far as I am aware, no model-based framework exists to address your question
(sorry that to be a downer). One could conceivably derive such a model
following Rich Fitzjohn's approach in QuaSSE (Sys Bio 2010) but instead of
the rate of speciation/extinction depending on the state of the continuous
variable, let the rate of a second variable be a function of the state of
the first. But this would certainly be a lot of effort to accomplish.

I agree with you as I do not think getting rates from standardized
independent contrasts (sensu Garland 1992) will really allow you to get at
your question.

the TL;DR version is that no such method exists (at least to my knowledge)
but this would definitely be a useful innovation.

hope this was at least somewhat helpful.

cheers,
matt




On Mon, Mar 11, 2013 at 12:50 PM, john d dobzhan...@gmail.com wrote:


Dear colleagues,

I got a philosophical/methodological/practical question.

I have a continuous dependent variable (e.g. range size) and a few
independent variables (e.g. body mass, encephalization ratio), and I
want to test how the rate of evolution of the dependent variable is
affected by the independent variables. The PCMs that I'm familiar with
cannot be used to answer  this question, because they usually try to
predict the dependent variable based on the independent variables
(e.g. PGLM) instead of looking at the rates of evolution. The whole
thing gets tricky if one decides to deal with the rates of evolution
of the indepentent variables as well (or not).

I guess one possibility would be to use standardized independent
contrasts (as in Garland 1992) for the estimation of rates. But I'm
not sure how to try to predict the *rate* of evolution of range size
from the values of the independent variables (and not their own
rates, which is what I guess I'd get if I transformed all variables
into standardized contrasts).

Any thoughts?

John

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] testing for correlates of rates of evolution

2013-03-12 Thread Liam J. Revell

I did a little further exploration of this proposed method - the 
results  discussion are here: 
http://blog.phytools.org/2013/03/investigating-whether-rate-of-one.html


Maybe this will be of some help in deciding the best approach to go 
forward with.


All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 3/11/2013 6:03 PM, Liam J. Revell wrote:

Hi John  Matt.

What about the admittedly ad hoc approach of computing the correlation
between the states at ancestral nodes for x  the squared contrasts for
corresponding nodes for y? Then you can generate a null distribution for
the test statistic (say, a Pearson or Spearman rank correlation) by
simulation. This seems to give reasonable type I error when the null is
correct, and when I simulate under the alternative (i.e., the rate of
Brownian evolution along a branch depends on the state at the
originating node) it sometimes is significant.

Here's a function that does what I've described (I think - please check
it carefully!). It needs phytools and all dependencies.

ratebystate-function(tree,x,y,nsim=100,method=c(pearson,spearman)){
method-method[1]
if(!is.binary.tree(tree)) tree-multi2di(tree)
V-phyl.vcv(cbind(x,y),vcv(tree),lambda=1)$R
a-fastAnc(tree,x)
b-pic(y,tree)[names(a)]^2
r-cor(a,b,method=method)
beta-setNames(lm(b~a)$coefficients[2],NULL)
foo-function(tree,V){
   XY-sim.corrs(tree,V)
   a-fastAnc(tree,XY[,1])
   b-pic(XY[,2],tree)[names(a)]^2
   r-cor(a,b,method=method)
   return(r)
}
r.null-c(r,replicate(nsim-1,foo(tree,V)))
P-mean(abs(r.null)=abs(r))
return(list(beta=beta,r=r,P=P,method=method))
}

Perhaps this is a good idea. I don't know. All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 3/11/2013 4:03 PM, Matt Pennell wrote:

John,

This is a tricky question. If your independent variables were
discrete, you
could use a stochastic character mapping approach to map state regimes
onto your tree and ask whether the regimes had different rates using a
model selection approach. (This could be done with the R packages
phytools
or ouwie, depending on what models of trait evolution you are
interested in
investigating).

However, since your independent variables are continuous, there is no
equivalent of the stochastic mapping approach to answer this question. As
far as I am aware, no model-based framework exists to address your
question
(sorry that to be a downer). One could conceivably derive such a model
following Rich Fitzjohn's approach in QuaSSE (Sys Bio 2010) but
instead of
the rate of speciation/extinction depending on the state of the
continuous
variable, let the rate of a second variable be a function of the state of
the first. But this would certainly be a lot of effort to accomplish.

I agree with you as I do not think getting rates from standardized
independent contrasts (sensu Garland 1992) will really allow you to
get at
your question.

the TL;DR version is that no such method exists (at least to my
knowledge)
but this would definitely be a useful innovation.

hope this was at least somewhat helpful.

cheers,
matt




On Mon, Mar 11, 2013 at 12:50 PM, john d dobzhan...@gmail.com wrote:


Dear colleagues,

I got a philosophical/methodological/practical question.

I have a continuous dependent variable (e.g. range size) and a few
independent variables (e.g. body mass, encephalization ratio), and I
want to test how the rate of evolution of the dependent variable is
affected by the independent variables. The PCMs that I'm familiar with
cannot be used to answer  this question, because they usually try to
predict the dependent variable based on the independent variables
(e.g. PGLM) instead of looking at the rates of evolution. The whole
thing gets tricky if one decides to deal with the rates of evolution
of the indepentent variables as well (or not).

I guess one possibility would be to use standardized independent
contrasts (as in Garland 1992) for the estimation of rates. But I'm
not sure how to try to predict the *rate* of evolution of range size
from the values of the independent variables (and not their own
rates, which is what I guess I'd get if I transformed all variables
into standardized contrasts).

Any thoughts?

John

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http

Re: [R-sig-phylo] testing for correlates of rates of evolution

2013-03-15 Thread Liam J. Revell


Hi Rob -

Regarding your comment: However, I think that this test is not quite 
the same as asking whether the rate of evolution of y depends on x. For 
example, it's possible you could (correctly) see a relationship with 
Liam's test even if there was no relationship between the rate of 
evolution of y and x - as long as larger ancestral x's produce higher 
variance in y, then Liam's test will reveal a relationship.


Perhaps I'm missing something, but this is exactly what we are looking 
for - that is, an effect of x on the instantaneous variance of the 
Brownian process of evolution in y (i.e., the rate of evolution in y, 
sensu O'Meara et al. 2006 and other refs).


All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 3/15/2013 7:15 PM, Rob Lanfear wrote:

Hi All,

Just a follow up on this. I was thinking about what Liam suggested, and
I think it's a different test to what I suggested, but maybe I'm wrong.

In particular, Liam's using squared contrasts in y, so that's asking
whether the absolute size of changes in y depends on  x at the ancestral
node. I might have missed something here, but that sounds very similar
in principle to Freckleton's test of whether the variance of trait
differences is unrelated to their absolute values [1], except that the
latter looks for correlations between the absolute value of differences
in x versus x at the ancestral node. It might be useful to consult that
paper [1] to get some more ideas for how to interpret those kinds of
results.

However, I think that this test is not quite the same as asking whether
the rate of evolution of y depends on x. For example, it's possible you
could (correctly) see a relationship with Liam's test even if there was
no relationship between the rate of evolution of y and x - as long as
larger ancestral x's produce higher variance in y, then Liam's test will
reveal a relationship. But the direction of the effect of x on the rate
of y could still be completely randomly assigned among datapoints (that
information disappears when squaring the y's).

Not sure if I'm just confused here. Any thoughts?

Rob






On 16 March 2013 09:30, john d dobzhan...@gmail.com
mailto:dobzhan...@gmail.com wrote:

Thank you all for your ideas. I'll probably explore further Liam's
method.

sincerely,

john

On Tue, Mar 12, 2013 at 5:33 PM, Liam J. Revell liam.rev...@umb.edu
mailto:liam.rev...@umb.edu wrote:
  I did a little further exploration of this proposed method -
the results 
  discussion are here:
 
http://blog.phytools.org/2013/03/investigating-whether-rate-of-one.html
 
  Maybe this will be of some help in deciding the best approach to
go forward
  with.
 
 
  All the best, Liam
 
  Liam J. Revell, Assistant Professor of Biology
  University of Massachusetts Boston
  web: http://faculty.umb.edu/liam.revell/
  email: liam.rev...@umb.edu mailto:liam.rev...@umb.edu
  blog: http://blog.phytools.org
 
  On 3/11/2013 6:03 PM, Liam J. Revell wrote:
 
  Hi John  Matt.
 
  What about the admittedly ad hoc approach of computing the
correlation
  between the states at ancestral nodes for x  the squared
contrasts for
  corresponding nodes for y? Then you can generate a null
distribution for
  the test statistic (say, a Pearson or Spearman rank correlation) by
  simulation. This seems to give reasonable type I error when the
null is
  correct, and when I simulate under the alternative (i.e., the
rate of
  Brownian evolution along a branch depends on the state at the
  originating node) it sometimes is significant.
 
  Here's a function that does what I've described (I think -
please check
  it carefully!). It needs phytools and all dependencies.
 
 
ratebystate-function(tree,x,y,nsim=100,method=c(pearson,spearman)){
  method-method[1]
  if(!is.binary.tree(tree)) tree-multi2di(tree)
  V-phyl.vcv(cbind(x,y),vcv(tree),lambda=1)$R
  a-fastAnc(tree,x)
  b-pic(y,tree)[names(a)]^2
  r-cor(a,b,method=method)
  beta-setNames(lm(b~a)$coefficients[2],NULL)
  foo-function(tree,V){
 XY-sim.corrs(tree,V)
 a-fastAnc(tree,XY[,1])
 b-pic(XY[,2],tree)[names(a)]^2
 r-cor(a,b,method=method)
 return(r)
  }
  r.null-c(r,replicate(nsim-1,foo(tree,V)))
  P-mean(abs(r.null)=abs(r))
  return(list(beta=beta,r=r,P=P,method=method))
  }
 
  Perhaps this is a good idea. I don't know. All the best, Liam
 
  Liam J. Revell, Assistant Professor of Biology
  University of Massachusetts Boston
  web: http://faculty.umb.edu/liam.revell/
  email: liam.rev...@umb.edu

Re: [R-sig-phylo] Concatenation of character matrices

2013-03-20 Thread Liam J. Revell


Hi Thomas.

For just two matrices (say, A  B), this is easy:

aa-sort(unique(c(rownames(A),rownames(B
bb-sort(unique(c(colnames(A),colnames(B
XX-matrix(NA,length(aa),length(bb),dimnames=list(aa,bb))
XX[rownames(A),colnames(B)]-A
XX[rownames(B),colnames(B)]-B

There may be some easier way to do this; and if your matrices were put 
in a list (say, when they were read into memory) we could easily apply 
even the method above automatically across the list automatically with 
no difficulty.


Say, for list of matrices YY:

aa-sort(unique(sapply(YY,rownames)))
bb-sort(unique(sapply(YY,colnames)))
XX-matrix(NA,length(aa),length(bb),dimnames=list(aa,bb))
for(i in 1:length(YY)) XX[rownames(YY[[i]]),colnames(YY[[i]])]-YY[[i]]

at least.

Hope this is helpful. Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 3/20/2013 3:07 PM, thom.g wrote:

Hi,

I have a certain number of character matrices that I want to merge together but 
unfortunately all the matrices don't have the same number of taxa and 
characters. However there is always more than one character/taxa in common 
between each matrix.

I have something like that:
TaxaCharacter1  Character2  Character4
Taxon1  0   0   0
Taxon2  0   0   1
Taxon5  0   1   1

Taxa Character2 Character3  Character5
Taxon3  0   1   1
Taxon4  1   0   1
Taxon5  1   1   1

And I'd like something like that:
Taxa Character1 Character2  Character3  Character4  Character5
Taxon1  0   0   ?   0   ?
Taxon2  0   0   ?   1   ?
Taxon3  ?   0   1   ?   1
Taxon4  ?   1   0   ?   1
Taxon5  0   1   1   1   1

Does someone already worked on this problem on R? The difference R base 
functions like merge() or aggregate() don't do the job unfortunately.

Thanks,

Thomas.



Thomas Guillerme
Trinity Postgraduate Research Studentship

Macroecology and Macroevolution Research Group
Zoology Department
School of Natural Sciences
Trinity College Dublin
2 College Green, Dublin 2
guill...@tcd.ie
@TGuillerme
+353877021326

http://www.tcd.ie/Zoology/research/ncooper/people.php
[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] ace in ape how to derive ancestral states (which.max)

2013-03-20 Thread Liam J. Revell


Hi Andres.

To answer your question at its face value, the line of code you want is:

anc-setNames(apply(ANC$lik.anc,1,function(x)
  names(which.max(x))),length(tree$tip.label)+1:tree$Nnode)

(The setNames ensures that your resultant vector gets the node numbers 
for names.)


This message really deserves a more lengthy response for two reasons:

1) The scaled likelihoods returned by ace(...,type=discrete) are 
actually conditional likelihoods of the subtree obtained via the pruning 
algorithm of Felsenstein. These should generally not be used for 
ancestral state estimation (we should use the marginal or joint 
reconstructions instead).


2) Extracting only the most likely state as *the* ancestral state at the 
node ignores the uncertainty of our estimate. Really we have a 
probability distribution for the states at each node.


- Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 3/20/2013 2:23 PM, Andrés Parada wrote:

Sorry if this was addressed before.

I reconstructed a binary discrete character with* ace* (coded as 0-1).
As suggested in APER 2nd edition with the* Sylvia *case I can obtain the
lik.anc values but I wanted to derive the ancestral states from this
matrix.

*First*

*ANC - ace (data$character, tree, type=d, method=ML, model=ARD)*

ANC$lik.anc look like this:

  01
   [1,] 9.999185e-01 8.147752e-05

*So I applied the same strategy as in the example mentioned above:*

*anc - apply(ANC$lik.anc, 1, which.max)*


This returned a matrix of with values of 1 or 2.

Is there a way to modify the formula and sort this so it returns only 0 or
1 depending on which state has higher probability?

Thank you very much for your help.

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] ace in ape how to derive ancestral states (which.max)

2013-03-21 Thread Liam J. Revell

One final comment on this. Since the conditional scaled likelihoods of 
the subtree and the marginal ancestral state reconstruction are 
equivalent at the root node of the tree, a computationally simple (but 
slow) method for getting the marginal ancestral state reconstructions is 
to move the root to each node of the tree and recompute the conditional 
likelihoods using ace. (Described in Felsenstein 2004, p. 259.) This 
works only for a reversible model of evolution. I posted a function for 
this on my blog (although it is straightforward), here: 
http://blog.phytools.org/2013/03/a-little-more-on-ancestral-state.html.


There are much more efficient algorithms for this, and I believe that 
this is what is implemented in diversitree by Rich Fitzjohn: 
http://www.zoology.ubc.ca/prog/diversitree/.


And, to reiterate, you should not be using the conditional scaled 
likelihoods from ace as ancestral state estimates (except at the root).


All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 3/20/2013 6:02 PM, Liam J. Revell wrote:

Just to follow up on ace(...,type=discrete), which I believe is not
doing what many people think it is doing:
http://blog.phytools.org/2013/03/conditional-scaled-likelihoods-in-ace.html.


Please don't take this as a criticism of ace - but I believe that the
difference between conditional, marginal, and joint reconstructions is
not appreciated by many phylogeny method users.

One option to get what I believe will asymptotically converge to the
joint reconstructions would be to use the posterior frequencies from
stochastic mapping under the model. This can be done using make.simmap
as described in my blog post - but please make sure that you have the
latest phytools release (phytools=0.2-27) if you want to do this, as
there was a bug in earlier versions of make.simmap.

Finally, something that I neglected to note in my post is that I believe
that Rich Fitzjohn's package diversitree can be used to obtain the joint
reconstructions under likelihood.

All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 3/20/2013 4:50 PM, Liam J. Revell wrote:

Hi Andres.

To answer your question at its face value, the line of code you want is:

anc-setNames(apply(ANC$lik.anc,1,function(x)
   names(which.max(x))),length(tree$tip.label)+1:tree$Nnode)

(The setNames ensures that your resultant vector gets the node numbers
for names.)

This message really deserves a more lengthy response for two reasons:

1) The scaled likelihoods returned by ace(...,type=discrete) are
actually conditional likelihoods of the subtree obtained via the pruning
algorithm of Felsenstein. These should generally not be used for
ancestral state estimation (we should use the marginal or joint
reconstructions instead).

2) Extracting only the most likely state as *the* ancestral state at the
node ignores the uncertainty of our estimate. Really we have a
probability distribution for the states at each node.

- Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 3/20/2013 2:23 PM, Andrés Parada wrote:

Sorry if this was addressed before.

I reconstructed a binary discrete character with* ace* (coded as 0-1).
As suggested in APER 2nd edition with the* Sylvia *case I can obtain the
lik.anc values but I wanted to derive the ancestral states from this
matrix.

*First*

*ANC - ace (data$character, tree, type=d, method=ML, model=ARD)*

ANC$lik.anc look like this:

  01
   [1,] 9.999185e-01 8.147752e-05

*So I applied the same strategy as in the example mentioned above:*

*anc - apply(ANC$lik.anc, 1, which.max)*


This returned a matrix of with values of 1 or 2.

Is there a way to modify the formula and sort this so it returns only
0 or
1 depending on which state has higher probability?

Thank you very much for your help.

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org

Re: [R-sig-phylo] Information on {ape] function nodelabels

2013-03-25 Thread Liam J. Revell


Hi Nicholas.

The nodes in a phylo object are numbered 1:tree$Nnode+N, where N is 
the number of tips in the tree  tree$Nnode is the number of nodes. That 
means if your tree has 10 tips and 9 internal nodes, the node numbers 
will be 11, 12, 13, ..., 19.


To use nodelabels(...,pie=XX) you can supply a matrix in which the rows 
sum to one (as in the transpose of your example matrix). If the node 
argument is left empty, then it is assumed that the rows are in node 
order (i.e., 11, 12, ..., 19). This is the order returned by (for 
example) ace. If you have a matrix in which the nodes are in a different 
order, you can use nodelabels(...,node=YY,pie=XX) in which YY supplies 
the node numbers (using the scheme above) for the rows of XX.


Note that as per recent discussion on this email list, 
ace(...,type=discrete) should not be used for ancestral state 
reconstruction except for the root node of the tree. I describe this in 
some posts to my blog 
(http://blog.phytools.org/2013/03/conditional-scaled-likelihoods-in-ace.html 
and 
http://blog.phytools.org/2013/03/a-little-more-on-ancestral-state.html). 
The scaled likelihoods at internal nodes are conditional likelihoods for 
the subtree, not marginal ancestral state reconstructions. I believe 
that the help page for ace will be clarified in future versions of ape.


Good luck. Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 3/25/2013 12:09 PM, Nicholas Crouch wrote:

Hi,

Hoe does the function nodelabels match data to the phylogeny? Ancestral
reconstruction produces a vector with a sample output:


st



   [,1] [,2][,3]   [,4][,5]
[1,] 0.0001381085 0.007642601 0.01638118 0.001591162 0.001268793
[2,] 0.9998618915 0.992357399 0.98361882 0.998408838 0.998731207


The numbers across the top do no correlate with the node numbers on
the phylogeny: compare



nodelabels(pie=t(st))



with



nodelabels()



Would it be possible to create a list vector of the nodes in a
phylogeny with the corresponding values from an ancestral
reconstruction?


An example could be something like:



vector


 Node 1

[1] 0.05

[2] 0.95

 Node 2

[1] 0.3

[2] 0.7



Thanks

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] combining p-values in PCMs

2013-04-02 Thread Liam J. Revell


Hi Vladimir.

I'm not clear on the source of your misunderstanding, so let me do my best.

1) I think that's right is casual shorthand in English. In this case I 
just mean that I think that this method will give us the correct 
variance on beta (but I'm not sure - hence 'I think'). It certainly 
seems better than computing the variance on the estimator from the 
single best tree or consensus tree, which ignores phylogenetic 
uncertainty; or from the variance in beta across trees, which ignores 
sampling error in the estimation of beta on any tree. It seems unlikely 
that this would dramatically underestimate the variance in beta.


2) The null hypothesis here is the slope of our fitted model (beta) is 
zero, but I don't think we need to focus specifically on null hypothesis 
testing. What we're really interested in is getting a true (or at 
least better) estimate of the uncertainty in the model coefficient that 
takes into account the different sources of error in our data  tree. I 
just suggested a t-test because people are often interested in getting a 
p-value from their fitted model and the p-value from a single consensus 
or best tree will surely be too small.


I hope that seems reasonable to you.

All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 4/2/2013 7:27 PM, Vladimir Minin wrote:

Hi Liam,

When you say I think that's right, I am not sure I understand what you
mean. In particular, what is your null hypothesis stated mathematically
and what frequentist guarantees do you have for the p-value you are
computing? Typically, we desire the uniform distribution of p-values
under the null. This would be easy to check by simulation, but this
simulation would require to generate both sequence and non-sequence
trait data, in other words we would need to decide what exactly the null
hypothesis is here.

I am suspicious of your approach, because, in general, Neyman-Pearson
hypothesis testing framework has hard time accommodating uncertainty in
nuisance parameters (phylogenies in this case).

Unfortunately, I don't know if a rigorous answer to the original
question posed by John exists, so I cannot suggest a practical
alternative other than building a large Bayesian model that encompasses
uncertainty about all its parameters. In that case, by the way, testing
hypotheses still remains infeasible, because Bayesian framework lacks
tools for testing model fit in the absence of an alternative (posterior
predictive model checks are exceptions, but their
interpretation/calibration is questionable anyway), meaning that
Bayesians can only compare models, but cannot examine the fit of one
(null) model in isolation.

-Vladimir

Vladimir Minin

Assistant Professor
Department of Statistics
University of Washington, Seattle
Padelford Hall C-315, Box 354322
Seattle, WA 98195-4322
telephone: 206-543-4302
web: www.stat.washington.edu/vminin http://www.stat.washington.edu/vminin


On Apr 1, 2013, at 9:04 AM, Liam J. Revell wrote:


Hi John  list.

Can I add to this by suggesting (in case it is not clear from Joe's
comment) that we should probably combine the variance among estimates
obtained from different trees in the posterior sample with the
sampling variance of any single estimate?

One fairly sensible way to do this is to compute the variance due to
phylogenetic uncertainty as the variance among estimates obtained from
the trees of the posterior sample; and then to compute the sampling
variance as the mean variance of the estimator from each tree; and
then add the two variances them. The standard error of our estimate
(computed as the mean across trees) is the square-root of this
variance. To conduct a hypothesis test on the regression coefficient,
then, you would compute the mean across trees and then the ratio of
the parameter and its standard error should have a t-distribution with
n-2 degrees of freedom for n taxa (not contrasts).

To do this from a practical perspective from trees in 'multiPhylo'
object (here trees) and data in x  y for a simple bivariate
regression, we do the following:

# first define the following custom function
ff-function(tree,x,y){
pic.x-pic(x,tree)
pic.y-pic(y,tree)
fit-lm(pic.y~pic.x-1)
setNames(c(coef(fit),vcov(fit)),c(beta,var(beta)))
}
# now apply to all trees in your sample
BB-t(sapply(trees,ff,x,y))
# total variance in beta estimated by
varBeta-var(BB[,beta])+mean(BB[,var(beta)])
t.beta-mean(BB[,beta])/sqrt(varBeta)
P.beta-2*pt(abs(t.beta),df=length(trees[[1]]$tip)-2,lower.tail=FALSE)

I think that's right.

All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu mailto:liam.rev...@umb.edu
blog: http://blog.phytools.org

On 4/1/2013 11:10 AM, Joe Felsenstein wrote:


John D asked:


Given that things have been quiet

Re: [R-sig-phylo] pruning a tree

2013-04-05 Thread Liam J. Revell


Hi Elaine.

A simpler way to do method 1 that doesn't require na.omit or match is:

# if there are species in the tree that are missing from the data:
ss-species.to.keep[,1] # species in the data
tree-birdtree
tree.pruned-drop.tip(tree,setdiff(tree$tip.label,ss))

I would guess that method 2 isn't working because your species names are 
not row names in your data matrix. To have that, you could do:


birddata-read.csv(H:/birddata_family_20130405.csv,header=T,row.names=1)

All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 4/5/2013 8:54 AM, Elaine Kuo wrote:

Hello



This Elaine.



I tried to prune a phylogeny tree using two methods based on the code
attached.



Method 1 returned the tip.label sharing between birddata and birdtree.

Method 2 returned nothing.



Please kindly indicate why Method 2 failed to prune the tree.

Also, please kindly explain the command “birdtree$tip.label[-na.omit”



Thank you again.



Code

Method 1

Library(ape)

birddata - read.csv(H:/ birddata_family_20130405.csv, header = TRUE)

birdtree - read.nexus(H:/ birddata_family_20130405.nexus)

setwd(H:)



## prune the phylogeny to include only species which are in the data set

species.to.keep-read.csv(H:/ birddata_family_20130405.csv, header = TRUE)

birdtree.pruned-drop.tip(birdtree,birdtree$tip.label[-na.omit(match(species.to.keep[,1],birdtree$tip.label))])





Method 2

Library(ape)

birddata - read.csv(H:/ birddata_family_20130405.csv, header = TRUE)

birdtree - read.nexus(H:/ birddata_family_20130405.nexus)

setwd(H:)



## prune the phylogeny to include only species which are in the data set

temp - name.check(birdtree, birddata)

birdtree.pruned - drop.tip(birdtree, tip=temp$Tree.not.data)

[[alternative HTML version deleted]]



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Exporting Ancestor Reconstruction from Mesquite to R for ouch

2013-04-05 Thread Liam J. Revell


Hi Martin.

Without more information or your input file, it's hard for me to 
evaluate why read.simmap is hanging. Did you run:


tree-read.simmap(filename,format=nexus,version=1.5)

If you have your discrete character data in R, you can also use the 
phytools function make.simmap to generate stochastic maps. Make sure 
that you are using a recent version of phytools (=0.2-26; =0.2-33 for 
uncertain/unknown tip states; =0.2-37 for non-symmetric transition 
matrices) as earlier versions are buggy.


My advice would be to use make.simmap to generate stochastic maps (or 
read them from file), and then use OUwie (in the package OUwie) to fit a 
multiple optima OU model to each mapped tree.


For discrete character x and continuous trait y (each vectors in the 
same order with names(x)=names(y)=species names); and a phylogeny, tree, do:


library(phytools)
library(OUwie)

mtrees-make.simmap(tree,x,nsim=100)
XX-data.frame(names(x),x,y)
result-lapply(mtrees,OUwie,data=XX,model=OUM,simmap.tree=TRUE)

This just gives you a giant list of the results returned by OUwie for 
each tree. It would probably be a good idea to organize your results of 
interest more sensibly than this so that they can be easily aggregated 
across mappings.


Good luck. All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 4/5/2013 6:13 PM, Martin Turcotte wrote:

Dear forum,
I am trying to conduct an ouch analysis and for it I must determine the 
ancestral selective regime at each node in my tree. To do so I need to conduct 
ancestral reconstruction on discrete unordered data. To my knowledge no package 
can do this in R (I hope I am wrong) and thus I used Mesquite's function to 
'Trace Character History'. The problem is exporting this information to R.
In Mesquite the only export option I see is  'Export Ancestral State 
Trace' but the only option is SIMMAP 1.5 format. I tried opening this file 
using  phytools'read.simmap  but it just hangs. I can open the file with 
FigTree and I can see the Nodel labels but when I export the tree as a nexus 
file it does not save the model labels.
Any advice would be appreciated.

thanks,
Mart

Martin Turcotte, Ph. D.
Dept. of Biology
University of Toronto at Mississauga
3359 Mississauga Road North, Mississauga,
Ontario, Canada, L5L 1C6
http://individual.utoronto.ca/martinturcotte



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] statistically test whether two characters evolve dependently

2013-04-08 Thread Liam J. Revell


Hi Peter.

I'm not sure if this is what you mean, but Felsenstein (2005, 2012) has 
a method for estimating the correlation between discrete characters or a 
discrete and a continuous character in which the discrete trait is 
modeled using the threshold model from quantitative genetics. There is a 
stand-alone program for this called Threshml 
(http://evolution.gs.washington.edu/phylip/download/threshml/). I have a 
Bayesian MCMC version in the phytools package called threshBayes 
(http://blog.phytools.org/search?q=threshBayes). So far, threshBayes 
only allows binary discrete characters  two traits; Threshml can 
analyze more than two traits - but I believe is still limited to binary 
discrete characters. Please let me know if it would be helpful to your 
research to extend threshBayes to allow for multistate (ordered, 
naturally) discrete characters.


All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 4/7/2013 6:35 PM, New Bio wrote:

Dear all,

Just started studying phylogeny. I know BayesTraits program can calculate
the likelihood of the observed data given the model of two charaters evolve
dependently or independently. I am wondering how to do it with R packages
or functions for discrete and continuous characters?

Best,
Peter

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] statistically test whether two characters evolve dependently

2013-04-08 Thread Liam J. Revell

Peter - I realized that this is not very clear. Threshml can analyze 
multiple continuous  discrete characters, but the discrete characters 
must be binary. threshBayes has the same limitation, but is also limited 
to only two traits (binary-binary, binary-continuous, or 
continuous-continuous). - Liam


Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 4/8/2013 8:27 PM, Liam J. Revell wrote:

Hi Peter.

I'm not sure if this is what you mean, but Felsenstein (2005, 2012) has
a method for estimating the correlation between discrete characters or a
discrete and a continuous character in which the discrete trait is
modeled using the threshold model from quantitative genetics. There is a
stand-alone program for this called Threshml
(http://evolution.gs.washington.edu/phylip/download/threshml/). I have a
Bayesian MCMC version in the phytools package called threshBayes
(http://blog.phytools.org/search?q=threshBayes). So far, threshBayes
only allows binary discrete characters  two traits; Threshml can
analyze more than two traits - but I believe is still limited to binary
discrete characters. Please let me know if it would be helpful to your
research to extend threshBayes to allow for multistate (ordered,
naturally) discrete characters.

All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 4/7/2013 6:35 PM, New Bio wrote:

Dear all,

Just started studying phylogeny. I know BayesTraits program can calculate
the likelihood of the observed data given the model of two charaters
evolve
dependently or independently. I am wondering how to do it with R packages
or functions for discrete and continuous characters?

Best,
Peter

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] statistically test whether two characters evolve dependently

2013-04-09 Thread Liam J. Revell


Thanks for clarifying this, Joe.

My function for ancestral character estimation under the threshold model 
(ancThresh) allows multistate data - but the order of the states along 
the liability axis needs to be specified a priori by the user.** The 
relative positions of the thresholds are sampled (along with the 
liabilities of tips  nodes) from their joint posterior probability 
distribution using MCMC. This method is still undergoing peer review, 
but it seems to work fairly well in simulations - so this also might be 
a viable approach for estimating the correlation between multistate 
threshold characters.


(**In my manuscript about ancThresh I suggest that an information 
criterion like DIC could be used to identify the correct order - but a 
single true order must exist.)


As for unordered multistate characters, I'm not sure how the threshold 
model could be used. Several people have made suggestions about this to 
me but I'm not sure I understand them.


All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 4/9/2013 7:22 PM, Joe Felsenstein wrote:


New Bio wrote:


Thanks, Liam. good to know. I think extending the tools to analyze traits
of ordered/unordered multistates can be very useful. There are many
interesting traits such as oxygent requirement (anaerobic, facultative,
aerobic) of microbes, which is ordered multistates, and habitats (water,
air, soil), which is unordered.


For approaches like the threshold model, it is not simple to see how to
handle multistate characters. Should we assume one scale? If so, how
far from the 0/1 threshold should we place the 1/2 threshold? That
becomes an additional parameter to be estimated.

Or should we have an additional axis, so that 0/1 is on the  x  axis,
while  [01]/2 is on another axis?  And then what do we do about
state 3 if it also exists? That way lies madness ... or perhaps a
good Ph.D. thesis topic.

(This is in some way related to Transformation Series Analysis,
which was an issue with parsimony methods. One imagined one's
states arranged in a character transformation series which could
even be a tree, the Character State Tree. Then one wanted to
infer the phylogeny and at the same time also the CST, using
parsimony as the criterion.  In a sense what I am raising is the
likelihood and modeling equivalent problem.)

Joe

Joe Felsenstein j...@gs.washington.edu mailto:j...@gs.washington.edu
  Department of Genome Sciences and Department of Biology,
  University of Washington, Box 355065, Seattle, WA 98195-5065 USA





___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] pruning tree of 10000 species

2013-04-10 Thread Liam J. Revell


Hi Elaine.

A tree of 10,000 tips written to file should not take up 3.9 GB. Even a 
1000 Newick trees of this size should not take up this much space. Are 
you sure this is the size of your file?


All the best, Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 4/10/2013 7:04 PM, Elaine Kuo wrote:

Hello

This is Elaine.

I downloaded a bid phylogeny tree of 1 species.
The file of the whole tree (1 species) is about 3.9 GB.
I want like to extract to 3000 species of them for my research.

Here comes two questions:
1. Please kindly advise if R is able to input a phylogeny tree of 1
species.
If possible, please kindly recommend the package.

2. Is there anything should be noted to calculate a data of 3.9 GB, such as
 the requirement of RAM memory.


Thank you.
Elaine

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] pruning tree of 10000 species

2013-04-12 Thread Liam J. Revell


Hi Gavin et al.

read.tree can accept a character string as input, so in theory you 
should be able to skip writing to file  then reading in - which could 
be quite slow for very large trees.


temptree - scan(MY_FILEPATH, what = , sep = \n, quiet = TRUE, 
skip = 0, comment.char = #, nlines=1)


tree-read.tree(text=temptree)

- Liam

Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 4/12/2013 11:38 AM, Gavin Thomas wrote:

Hi Elaine

The online tool is limited to 2500 species because we have found the server 
essentially grinds to a halt if we go beyond that.

Jan is correct about the original data set. Each file contains 1000 trees. I 
would not advise trying to read these into R in one go. It is possible to read 
one or trees from the file into R and this will be much more manageable. It's a 
little convoluted:

# Scan the first line from the tree file. This will be a single newick string. 
If you want to use the second tree then change to skip=1 (this tells R how many 
lines in the inout file to skip. If you want to randomly select a tree then use 
skip=sample(1:1000, 1)
temptree - scan(MY_FILEPATH, what = , sep = \n, quiet = TRUE, skip = 0, 
comment.char = #, nlines=1)

# Now write out the text string to file. You now have a text file with a single 
tree
cat(temptree, file=tree1.tre)

# Read that tree back in as a phylo object
tree - read.tree(tree1.tre)

# So now you have your complete tree in R and you want to subset it to your 
3000 tips. This is straightforward using droptip. You just need to read your 
list of 3000 species into R and use that in place of species_subset

tree_small - drop.tip(tree, tip=setdiff(tree$tip.label, species_subset)

You could loop across a whole set of trees to get a set of 1000 pruned trees 
(the file will still be big but would contain trees of just the taxa of 
interest):


for (i in 1:1000) {
temptree - scan(MY_FILEPATH, what = , sep = \n, quiet = TRUE, skip = i-1, 
comment.char = #, nlines=1)
cat(temptree, file=tree1.tre)
tree - read.tree(tree1.tre)
tree_small - drop.tip(tree, tip=setdiff(tree$tip.label, species_subset)
write.tree(tree_small, file=MyNewTrees.tre, append=TRUE)
}


None of this is elegant - any better suggestions would be very welcome!

I hope that helps.

Best
Gavin

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

1 2 3 4 5 >

1 - 100 of 408 matches

Mail list logo