Re: [R-sig-phylo] pic() vs gls()

2011-07-14 Thread ppiras
Hi all,

Citing
http://www.r-phylo.org/wiki/HowTo/Ancestral_State_Reconstruction:
Using Felsenstein's (1985) phylogenetic independent
contrasts (pic); This is also a Brownian-motion based
estimator, but it only takes descendants of each node
into account in reconstructing the state at that node.
More basal nodes are ignored.

I  THINK that, on the opposite, more basal nodes are
NOT ignored in gls and for this reason results can
differ slightly
I'm wrong?



Hi Nick.

For your first (simple) problem, I believe you want to
do:

  read.csv(file=Mason_data.csv,row.names=1)-smdata

Regarding the more complicated issue, the problem of
non-independence in
linear regression comes in the residual error of the
model.  Thus, you
should fit a correlation structure for the error in Y
given X (not for X
 Y separately as you have done with PICs here).  This
indeed can be
done using gls() - so, for instance:

pglsModel1a-gls(Pause_Rate~Comp.1,
correlation=corMartins(1,
spbm),data=smdata)

fits a linear model in which the residual error of
Pause_Rate given
Comp.1 is distributed according to the correlation
structure
corMartins() with alpha estimated using REML.

The way to reproduce this result using contrasts is
the following:

  alpha-attr(pglsModel1a$apVar,Pars)[corStruct]
# get fitted alpha
  smdata-as.matrix(smdata) # important
  pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=alpha))
  pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=alpha))
  summary(lm(pr_contrast~pc1_contrast-1))

Which if you compare to:

  summary(pglsModel1a)

you should find yields the same results and P-value.
(Note that
smdata-as.matrix(smdata) seems to be important here
as if smdata is a
data frame, then smdata[,1] does not inherit the
rownames of smdata and
pic() will return an incorrect set of contrasts.)

I hope this helps.  No doubt other subscribers to the
list may have
something to add.

Best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 7/13/2011 11:07 PM, Nicholas Mason wrote:
 Dear R-sig-phylo users,
 I have a question regarding comparative analyses of
 contrasts done with
 the functions fitContinuous() and pic() compared to
 using PGLS (using
 the gls() function).

  From my understanding the first method involving
 pic() below fits alpha
 (estimated using fitContinuous()) to each character
 independently then
 performs a regression on the resulting contrasts.

 The second (PGLS) method involving gls() fits both
 the regression and
 contrasts simultaneously and reports a single alpha
 value for the
 relationship between two traits.

 These two methods appear very similar, yet I get
 slightly different
 results between the two functions, particularly with
 respect to
 p-values. Using fitContinuous(), the results from
 the data set attached
 are r2 = 0.075 p = 0.091. Using gls(): r2 = 0.079 p
 =0.0519.
 Furthermore, if I switch the dependent and
 independent variables (V1~V2
 vs V2~V1), I get the same answer with pic(), but
 gls() differs: r2 =
 -0.069 p =0.02 (see pglsModel1a vs pglsModel1b in
 the attachment).

 Could anyone explain why these functions vary (in my
 mind they were
 doing essentially the same thing) and if there are
 situations where one
 should be favored over another? Also, why does PGLS
 vary if you switch
 the dependent/independent variables in the linear
 model?

 Thanks in advance for any advice or comments
 offered!!

 Cheers,
 Nick Mason

 Below I have the code I have been using (also
 attached as Mason.R):

 require(ape)
 require(nlme)
 require(geiger)
 read.csv(file=Mason_data.csv)-smdata
 rownames(smdata)-smdata[,1]
 smdata[,1]-NULL#ASIDE: If anyone could tell me how
 to get around these
 two lines of code, that would also be awesome.
 Header=T doesn't seem to
 work for me.
 read.tree(file=Mason_tree.tre)-spbm
 name.check(smdata,spbm)

 fitContinuous(spbm,smdata,model=OU)-ou2

 pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=ou2$Pause_Rate$alpha))
 pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=ou2$Comp.1$alpha))

 summary(lm(pr_contrast~pc1_contrast-1))

 pglsModel1a-gls(Pause_Rate~Comp.1,
 correlation=corMartins(1,
 spbm),data=smdata)
 summary(pglsModel1a)

 pglsModel1b-gls(Comp.1~Pause_Rate,
 correlation=corMartins(1,
 spbm),data=smdata)
 summary(pglsModel1b)








 -
 Nicholas Albert Mason
 MS Candidate, Burns Lab
 Department of Biology - EB Program
 San Diego State University
 5500 Campanile Drive
 San Diego, CA 92182-4614
 845-240-0649 (cell)





 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org

Re: [R-sig-phylo] pic() vs gls()

2011-07-14 Thread Joe Felsenstein


Paolo Piras wrote --


Citing
http://www.r-phylo.org/wiki/HowTo/Ancestral_State_Reconstruction:
Using Felsenstein's (1985) phylogenetic independent
contrasts (pic); This is also a Brownian-motion based
estimator, but it only takes descendants of each node
into account in reconstructing the state at that node.
More basal nodes are ignored.

I  THINK that, on the opposite, more basal nodes are
NOT ignored in gls and for this reason results can
differ slightly
I'm wrong?


The contrast algorithm if continued to the root,
makes the correct ancestral reconstruction for the root.
You are correct that values for higher nodes in the
tree are not the correct ML reconstruction for those
nodes.  If the tree is rerooted at any interior node
and the algorithm used for that, then that node's
reconstruction will be correct.  There are ways of
re-using information so that the total effort of doing this
for all interior nodes will be no worse than about twice
that of a single pass through the tree.

However people may prefer to use PGLS, which if
properly done should give the proper estimates for
all nodes.  There is some discussion of this in
Rohlf's 2001 paper in Evolution.

Joe

Joe Felsenstein, j...@gs.washington.edu
 Dept. of Genome Sciences, Univ. of Washington
 Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] pic() vs gls()

2011-07-14 Thread Theodore Garland Jr
The independent contrasts algebra for what Joe is talking about can be found in 
these papers:

Garland, T., Jr., P. E. Midford, and A. R. Ives. 1999. An introduction to 
phylogenetically based statistical methods, with a new method for confidence 
intervals on ancestral values. American Zoologist 39:374-388.

Garland, T., Jr., and A. R. Ives. 2000. Using the past to predict the present: 
Confidence intervals for regression equations in phylogenetic comparative 
methods. American Naturalist 155:346-364.

Cheers,
Ted

Theodore Garland, Jr.
Professor
Department of Biology
University of California, Riverside
Riverside, CA 92521
Office Phone:  (951) 827-3524
Facsimile:  (951) 827-4286 = Dept. office (not confidential)
Email:  tgarl...@ucr.edu
http://www.biology.ucr.edu/people/faculty/Garland.html

Experimental Evolution: Concepts, Methods, and Applications of Selection 
Experiments
Edited by Theodore Garland, Jr. and Michael R. Rose
http://www.ucpress.edu/book.php?isbn=9780520261808
(PDFs of chapters are available from me or from the individual authors)


From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on 
behalf of Joe Felsenstein [j...@gs.washington.edu]
Sent: Thursday, July 14, 2011 6:35 AM
To: ppi...@uniroma3.it
Cc: r-sig-phylo
Subject: Re: [R-sig-phylo] pic() vs gls()

Paolo Piras wrote --

 Citing
 http://www.r-phylo.org/wiki/HowTo/Ancestral_State_Reconstruction:
 Using Felsenstein's (1985) phylogenetic independent
 contrasts (pic); This is also a Brownian-motion based
 estimator, but it only takes descendants of each node
 into account in reconstructing the state at that node.
 More basal nodes are ignored.

 I  THINK that, on the opposite, more basal nodes are
 NOT ignored in gls and for this reason results can
 differ slightly
 I'm wrong?

The contrast algorithm if continued to the root,
makes the correct ancestral reconstruction for the root.
You are correct that values for higher nodes in the
tree are not the correct ML reconstruction for those
nodes.  If the tree is rerooted at any interior node
and the algorithm used for that, then that node's
reconstruction will be correct.  There are ways of
re-using information so that the total effort of doing this
for all interior nodes will be no worse than about twice
that of a single pass through the tree.

However people may prefer to use PGLS, which if
properly done should give the proper estimates for
all nodes.  There is some discussion of this in
Rohlf's 2001 paper in Evolution.

Joe

Joe Felsenstein, j...@gs.washington.edu
  Dept. of Genome Sciences, Univ. of Washington
  Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] pic() vs gls()

2011-07-13 Thread Liam J. Revell

Hi Nick.

For your first (simple) problem, I believe you want to do:

 read.csv(file=Mason_data.csv,row.names=1)-smdata

Regarding the more complicated issue, the problem of non-independence in 
linear regression comes in the residual error of the model.  Thus, you 
should fit a correlation structure for the error in Y given X (not for X 
 Y separately as you have done with PICs here).  This indeed can be 
done using gls() - so, for instance:


pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1, 
spbm),data=smdata)


fits a linear model in which the residual error of Pause_Rate given 
Comp.1 is distributed according to the correlation structure 
corMartins() with alpha estimated using REML.


The way to reproduce this result using contrasts is the following:

 alpha-attr(pglsModel1a$apVar,Pars)[corStruct] # get fitted alpha
 smdata-as.matrix(smdata) # important
 pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=alpha))
 pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=alpha))
 summary(lm(pr_contrast~pc1_contrast-1))

Which if you compare to:

 summary(pglsModel1a)

you should find yields the same results and P-value.  (Note that 
smdata-as.matrix(smdata) seems to be important here as if smdata is a 
data frame, then smdata[,1] does not inherit the rownames of smdata and 
pic() will return an incorrect set of contrasts.)


I hope this helps.  No doubt other subscribers to the list may have 
something to add.


Best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 7/13/2011 11:07 PM, Nicholas Mason wrote:

Dear R-sig-phylo users,
I have a question regarding comparative analyses of contrasts done with
the functions fitContinuous() and pic() compared to using PGLS (using
the gls() function).

 From my understanding the first method involving pic() below fits alpha
(estimated using fitContinuous()) to each character independently then
performs a regression on the resulting contrasts.

The second (PGLS) method involving gls() fits both the regression and
contrasts simultaneously and reports a single alpha value for the
relationship between two traits.

These two methods appear very similar, yet I get slightly different
results between the two functions, particularly with respect to
p-values. Using fitContinuous(), the results from the data set attached
are r2 = 0.075 p = 0.091. Using gls(): r2 = 0.079 p =0.0519.
Furthermore, if I switch the dependent and independent variables (V1~V2
vs V2~V1), I get the same answer with pic(), but gls() differs: r2 =
-0.069 p =0.02 (see pglsModel1a vs pglsModel1b in the attachment).

Could anyone explain why these functions vary (in my mind they were
doing essentially the same thing) and if there are situations where one
should be favored over another? Also, why does PGLS vary if you switch
the dependent/independent variables in the linear model?

Thanks in advance for any advice or comments offered!!

Cheers,
Nick Mason

Below I have the code I have been using (also attached as Mason.R):

require(ape)
require(nlme)
require(geiger)
read.csv(file=Mason_data.csv)-smdata
rownames(smdata)-smdata[,1]
smdata[,1]-NULL#ASIDE: If anyone could tell me how to get around these
two lines of code, that would also be awesome. Header=T doesn't seem to
work for me.
read.tree(file=Mason_tree.tre)-spbm
name.check(smdata,spbm)

fitContinuous(spbm,smdata,model=OU)-ou2

pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=ou2$Pause_Rate$alpha))
pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=ou2$Comp.1$alpha))

summary(lm(pr_contrast~pc1_contrast-1))

pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1,
spbm),data=smdata)
summary(pglsModel1a)

pglsModel1b-gls(Comp.1~Pause_Rate, correlation=corMartins(1,
spbm),data=smdata)
summary(pglsModel1b)








-
Nicholas Albert Mason
MS Candidate, Burns Lab
Department of Biology - EB Program
San Diego State University
5500 Campanile Drive
San Diego, CA 92182-4614
845-240-0649 (cell)





___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo