Re: [R-sig-phylo] PGLS, categorical data and regression through origin

2012-06-07 Thread Julien Lorion
Yanthe,
Paolo,

Thanks for your replies... 

You properly understood... log(BodySize) is the dependent variable and habitat 
and location of symbionts are categorical independent variables... 

At first there was another independent continuous variables... and I was 
interested in assessing effects of continuous and categorical variables 
simultaneously... That's why I used PGLS

I agree that a regression model is quite counterintuitive for categorical 
independent variables... Like most people, I was taught (I miss that time :-) 
that anova were for categorical variables and regressions for continuous 
variables... Later I was told that regression and anova have a lot in common 
actually... 

As far as I have properly understood suggestions of others here and literature, 
 pGLS can be applied to my data... It may even be the only way while mixing 
continuous and categorical variables simultaneously... I treated my categorical 
variables as such (factors) and my understanding of the PGLS is that it uses 
some dummy coding to perform the regression... 

My results from pGLS are the same as from phy.anova... Yet I agree that the 
latter has a more intuitive interpretation... and as my continuous independant 
variable has no effect, I think I don't really need PGLS... phy.anova are 
enough.. 

Sorry Paolo, I actually meant there was a difference between:
log(BodySize)~symbiont location*habitat   VSlog(BodySize)~Habitat*Symbiont 
location
Once the effect of one variable is removed, the other one is not significant... 
and the result depends on the order in which the variables are considered... 
Nice idea to test combination of factors... After checking more carefully my 
data, it unfortunately seems that I don't have enough data to keep statistical 
power while doing so (either using anova or pGLS) ... So I will stick to simple 
phy.anovas.

Cheers

Julien


















On Jun 7, 2012, at 12:47 AM, Yanthe Pearson wrote:

 Based on what you say it sounds like Y=log(body size) is the dependent 
 variable and X1=habitat and X2=location are independent variables?
 
 
 Yanthe E. Pearson
 Postdoctoral Researcher
 Dept. of Biology, Fagan Lab
 University of Maryland College Park
 
 Email: ypear...@umd.edu
 
 From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] 
 On Behalf Of ppi...@uniroma3.it [ppi...@uniroma3.it]
 Sent: Wednesday, June 06, 2012 9:55 AM
 To: Julien Lorion
 Cc: r-sig-phylo@r-project.org
 Subject: Re: [R-sig-phylo] PGLS, categorical data and regression through 
 origin
 
 Dear Julien,
 maybe I dont understand your rmodel...but IF your
 model has one continuous dep. and one categorical
 (binary) indep. it looks like an ANOVA model: in this
 case phy.anova() [or phy.manova() if you have 1
 dependents]  in the R package geiger does it. IF the
 model is different ...please explain better.
 Inverting the dependence-independence
 relationshipdepends on your hypothesis testing.
 When the categorical becomes the dependent you need to
 apply a phylogentic logistic regression (in the case
 of binary) or multinomial logistic (I think MCMCglmm
 does it).
 
 IF you have more factor variables as
 dependent...applying comparative methods maybe is more
 complicated but a trick could be useful.
 Say you have a two-levels and a four levels factor
 variables. You can test * pairwise *(!!) your
 (**CONTINUOUS**!!) dependent against a new factor
 where you coded any possible level identifiable by all
 occurring combinations of the levels of the two factor
 variables. Not necessarily there will be a n°levels
 equal to the n°levels of fisrt factor variable *
 n°levels of the second one: it depends from real data.
 
 But you wrote: symbiont location ~ habitat VS habitat
 ~ symbiont location
 hereall things are categoricalis it?...Where
 is the body size?
 
 
 Best
 Paolo
 
 
 
 
 
 
 Dear colleagues,
 
 I am testing the impact of categorical binary
 characters (habitat and presence/absence of symbionts)
 on a continuous  variable (log of body size) using
 PGLS...
 
 I am not sure if I should remove the intercept from
 the formulae and the biological interpretation of the
 absence of intercept for categorical variables.. All
 papers I found on the issue of regression through
 origin  were about PIC and continuous characters
 
 It does not change my conclusions when I test
 individually each variable (BOTH have a HIGHLY
 significant impact on body size), but it does when I
 test them simultaneously:
 
 Coefficients:
Estimate Std. Error t value
 Pr(|t|)
 (Intercept) 3.252335   0.731056  4.4488
 5.115e-05 ***
 Habitat10.706823   0.434013  1.6286
 0.1099
 Location1   0.598868   0.810679  0.7387
 0.4637
 Habitat1:Location1 -0.078772   0.905514 -0.0870
 0.9310
 F-statistic: 3.744 on 4 and 48 DF,  p-value: 0.009906
 
 Coefficients:
Estimate Std. Error t value
 Pr(|t

Re: [R-sig-phylo] PGLS, categorical data and regression through origin

2012-06-06 Thread ppiras
Dear Julien,
maybe I dont understand your rmodel...but IF your
model has one continuous dep. and one categorical
(binary) indep. it looks like an ANOVA model: in this
case phy.anova() [or phy.manova() if you have 1
dependents]  in the R package geiger does it. IF the
model is different ...please explain better.
Inverting the dependence-independence
relationshipdepends on your hypothesis testing.
When the categorical becomes the dependent you need to
apply a phylogentic logistic regression (in the case
of binary) or multinomial logistic (I think MCMCglmm
does it).

IF you have more factor variables as
dependent...applying comparative methods maybe is more
complicated but a trick could be useful.
Say you have a two-levels and a four levels factor
variables. You can test * pairwise *(!!) your
(**CONTINUOUS**!!) dependent against a new factor
where you coded any possible level identifiable by all
occurring combinations of the levels of the two factor
variables. Not necessarily there will be a n°levels
equal to the n°levels of fisrt factor variable *
n°levels of the second one: it depends from real data.

But you wrote: symbiont location ~ habitat VS habitat
~ symbiont location
hereall things are categoricalis it?...Where
is the body size?


Best
Paolo






Dear colleagues,

I am testing the impact of categorical binary
characters (habitat and presence/absence of symbionts)
on a continuous  variable (log of body size) using
PGLS...

I am not sure if I should remove the intercept from
the formulae and the biological interpretation of the
absence of intercept for categorical variables.. All
papers I found on the issue of regression through
origin  were about PIC and continuous characters

It does not change my conclusions when I test
individually each variable (BOTH have a HIGHLY
significant impact on body size), but it does when I
test them simultaneously:

Coefficients:
Estimate Std. Error t value
Pr(|t|)
(Intercept) 3.252335   0.731056  4.4488
5.115e-05 ***
Habitat10.706823   0.434013  1.6286
0.1099
Location1   0.598868   0.810679  0.7387
0.4637
Habitat1:Location1 -0.078772   0.905514 -0.0870
0.9310
F-statistic: 3.744 on 4 and 48 DF,  p-value: 0.009906

Coefficients:
Estimate Std. Error t value
Pr(|t|)
Habitat03.252335   0.731056  4.4488
5.115e-05 ***
Habitat13.959158   0.782851  5.0574
6.629e-06 ***
Location1   0.598868   0.810679  0.7387
0.4637
Habitat1:Location1 -0.078772   0.905514 -0.0870
0.9310
F-statistic: 3.744 on 4 and 48 DF,  p-value: 0.009906

Also, the output of the above analysis depends on the
order of variables (symbiont location ~ habitat VS
habitat ~ symbiont location).. Once the effect of one
variable is removed, the effect of the other one is no
longer significant, likely because both are
correlated. Is there an objective way to decide which
one explains the best the body size ? Perhaps
considering amount of variance explained by each
variable individually ?

Apologies for these very naive questions.. I guess
answers are obvious to most of you but...

Thanks in advance for your comments.

Regards

Julien Lorion
PhD, Post-doctoral fellow of the Japan Society for the
Promotion of Science
Japan Agency for Marine-Earth Science and Technology
(JAMSTEC)
Marine Ecosystems Research Department
2-15 Natsushima, Yokosuka 237-0061 Japan
Phone: +81-46-867-9570, Fax: +81-46-867-9525






[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



-- 
Paolo Piras
Center for Evolutionary Ecology
 and
Dipartimento di Scienze Geologiche, Università Roma Tre
Largo San Leonardo Murialdo, 1, 00146 Roma
Tel: +390657338000
email: ppi...@uniroma3.it

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo