[R-sig-phylo] PGLS, categorical data and regression through origin

2012-06-06 Thread Julien Lorion
Dear colleagues,

I am testing the impact of categorical binary characters (habitat and 
presence/absence of symbionts) on a continuous  variable (log of body size) 
using PGLS... 

I am not sure if I should remove the intercept from the formulae and the 
biological interpretation of the absence of intercept for categorical 
variables.. All papers I found on the issue of regression through origin  were 
about PIC and continuous characters

It does not change my conclusions when I test individually each variable (BOTH 
have a HIGHLY significant impact on body size), but it does when I test them 
simultaneously: 

Coefficients:
Estimate Std. Error t value  Pr(|t|)
(Intercept) 3.252335   0.731056  4.4488 5.115e-05 ***
Habitat10.706823   0.434013  1.62860.1099
Location1   0.598868   0.810679  0.73870.4637
Habitat1:Location1 -0.078772   0.905514 -0.08700.9310
F-statistic: 3.744 on 4 and 48 DF,  p-value: 0.009906

Coefficients:
Estimate Std. Error t value  Pr(|t|)
Habitat03.252335   0.731056  4.4488 5.115e-05 ***
Habitat13.959158   0.782851  5.0574 6.629e-06 ***
Location1   0.598868   0.810679  0.73870.4637
Habitat1:Location1 -0.078772   0.905514 -0.08700.9310 
F-statistic: 3.744 on 4 and 48 DF,  p-value: 0.009906

Also, the output of the above analysis depends on the order of variables 
(symbiont location ~ habitat VS habitat ~ symbiont location).. Once the effect 
of one variable is removed, the effect of the other one is no longer 
significant, likely because both are correlated. Is there an objective way to 
decide which one explains the best the body size ? Perhaps considering amount 
of variance explained by each variable individually ?

Apologies for these very naive questions.. I guess answers are obvious to most 
of you but... 

Thanks in advance for your comments. 

Regards

Julien Lorion
PhD, Post-doctoral fellow of the Japan Society for the Promotion of Science
Japan Agency for Marine-Earth Science and Technology (JAMSTEC)
Marine Ecosystems Research Department
2-15 Natsushima, Yokosuka 237-0061 Japan
Phone: +81-46-867-9570, Fax: +81-46-867-9525






[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] PGLS, categorical data and regression through origin

2012-06-06 Thread ppiras
Dear Julien,
maybe I dont understand your rmodel...but IF your
model has one continuous dep. and one categorical
(binary) indep. it looks like an ANOVA model: in this
case phy.anova() [or phy.manova() if you have 1
dependents]  in the R package geiger does it. IF the
model is different ...please explain better.
Inverting the dependence-independence
relationshipdepends on your hypothesis testing.
When the categorical becomes the dependent you need to
apply a phylogentic logistic regression (in the case
of binary) or multinomial logistic (I think MCMCglmm
does it).

IF you have more factor variables as
dependent...applying comparative methods maybe is more
complicated but a trick could be useful.
Say you have a two-levels and a four levels factor
variables. You can test * pairwise *(!!) your
(**CONTINUOUS**!!) dependent against a new factor
where you coded any possible level identifiable by all
occurring combinations of the levels of the two factor
variables. Not necessarily there will be a n°levels
equal to the n°levels of fisrt factor variable *
n°levels of the second one: it depends from real data.

But you wrote: symbiont location ~ habitat VS habitat
~ symbiont location
hereall things are categoricalis it?...Where
is the body size?


Best
Paolo






Dear colleagues,

I am testing the impact of categorical binary
characters (habitat and presence/absence of symbionts)
on a continuous  variable (log of body size) using
PGLS...

I am not sure if I should remove the intercept from
the formulae and the biological interpretation of the
absence of intercept for categorical variables.. All
papers I found on the issue of regression through
origin  were about PIC and continuous characters

It does not change my conclusions when I test
individually each variable (BOTH have a HIGHLY
significant impact on body size), but it does when I
test them simultaneously:

Coefficients:
Estimate Std. Error t value
Pr(|t|)
(Intercept) 3.252335   0.731056  4.4488
5.115e-05 ***
Habitat10.706823   0.434013  1.6286
0.1099
Location1   0.598868   0.810679  0.7387
0.4637
Habitat1:Location1 -0.078772   0.905514 -0.0870
0.9310
F-statistic: 3.744 on 4 and 48 DF,  p-value: 0.009906

Coefficients:
Estimate Std. Error t value
Pr(|t|)
Habitat03.252335   0.731056  4.4488
5.115e-05 ***
Habitat13.959158   0.782851  5.0574
6.629e-06 ***
Location1   0.598868   0.810679  0.7387
0.4637
Habitat1:Location1 -0.078772   0.905514 -0.0870
0.9310
F-statistic: 3.744 on 4 and 48 DF,  p-value: 0.009906

Also, the output of the above analysis depends on the
order of variables (symbiont location ~ habitat VS
habitat ~ symbiont location).. Once the effect of one
variable is removed, the effect of the other one is no
longer significant, likely because both are
correlated. Is there an objective way to decide which
one explains the best the body size ? Perhaps
considering amount of variance explained by each
variable individually ?

Apologies for these very naive questions.. I guess
answers are obvious to most of you but...

Thanks in advance for your comments.

Regards

Julien Lorion
PhD, Post-doctoral fellow of the Japan Society for the
Promotion of Science
Japan Agency for Marine-Earth Science and Technology
(JAMSTEC)
Marine Ecosystems Research Department
2-15 Natsushima, Yokosuka 237-0061 Japan
Phone: +81-46-867-9570, Fax: +81-46-867-9525






[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



-- 
Paolo Piras
Center for Evolutionary Ecology
 and
Dipartimento di Scienze Geologiche, Università Roma Tre
Largo San Leonardo Murialdo, 1, 00146 Roma
Tel: +390657338000
email: ppi...@uniroma3.it

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Question about Kcalc with 1 data points per species

2012-06-06 Thread Liam J. Revell

Hi Jonathan.

The thing to do in this case is to estimate K with within-species 
variability. This is described in Ives et al. (2007; Syst. Biol.) and 
implemented in the phytools function phylosig. This will give an 
unbiased estimate of K. (K estimated when within-species variability is 
ignored is downwardly biased.)


To do this, you need the species means for each species and a vector of 
the standard error of the means. If we cannot estimate the standard 
error of the mean for some species (because n=1), then we can just use 
the mean variance. To get the means and standard errors (assuming your 
raw data is in a vector, x, with species names), we can just do the 
following:


# get the mean by species
temp-aggregate(x,by=list(names(x)),mean)
xbar-temp[,2]; names(xbar)-temp[,1]

# get the variance by species
temp-aggregate(x,by=list(names(x)),var)
xvar-temp[,2]; names(xvar)-temp[,1]
# replace NA with mean variance
xvar[is.na(xvar)]-mean(xvar,na.rm=TRUE)

# get the N per species
n-as.vector(table(names(y)))

# compute the standard errors
se-sqrt(xvar/n)

# compute K
K-phylosig(tree,xbar,se=se)

Hopefully that works for you.

All the best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 6/6/2012 6:12 PM, Jonathan Benstead wrote:

Dear r-sig-phylo members - Colleagues and I came across an old post about 
calculating the Blomberg K statistic in studies where there is more than one 
data point for some species, by using a tree that includes zero-branch lengths 
for those multiple studies. This seems ideal for our data, which has several 
species with up to 14 data points for trait values. We have prepared a tree 
with the necessary zero-branch lengths and our tip labels match our row labels. 
However, when I run the code I get the following error message:

Error in Initialize.corSymm(X[[1L]], ...) :
   Initial values for corSymm must be between -1 and 1

I found two references to this problem among old posts, but nothing that helped 
solve the problem, except that it might be something to do with the tree. As 
someone else has pointed out, the problem goes away if we change the zero 
branch lengths to a positive number, but we don't want to give these branches 
an arbitrary length. Is there another way to do this?

Here's our code:

library(ape)
library(picante)
Fish = read.table(Fish.csv, header = TRUE, sep = ,)
AuthP2-as.matrix(as.matrix(Fish)[,1])
tree1 = read.nexus(consensus_constraint_polytomies.nex)
tree2 = drop.tip(tree1, Erpetoichthys_calabaricus_AY442348_)
names(AuthP2)-tree2$tip.label
Kcalc(AuthP2[tree2$tip.label], tree2)

Any help getting this analysis to work would be greatly appreciated.

Jon


Jon Benstead
Associate Professor
Aquatic Biology Program
University of Alabama, Box 870206
1124 Bevill Building, 201 7th Ave.
Tuscaloosa, Alabama 35487-0206

Lab homepage: http://bama.ua.edu/~jbenstead
Iceland blog: http://icelandstreams.blogspot.com/

Tel: 205-348-9034
Fax: 205-348-1403

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo