[R-sig-phylo] PGLS, categorical data and regression through origin
Dear colleagues, I am testing the impact of categorical binary characters (habitat and presence/absence of symbionts) on a continuous variable (log of body size) using PGLS... I am not sure if I should remove the intercept from the formulae and the biological interpretation of the absence of intercept for categorical variables.. All papers I found on the issue of regression through origin were about PIC and continuous characters It does not change my conclusions when I test individually each variable (BOTH have a HIGHLY significant impact on body size), but it does when I test them simultaneously: Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 3.252335 0.731056 4.4488 5.115e-05 *** Habitat10.706823 0.434013 1.62860.1099 Location1 0.598868 0.810679 0.73870.4637 Habitat1:Location1 -0.078772 0.905514 -0.08700.9310 F-statistic: 3.744 on 4 and 48 DF, p-value: 0.009906 Coefficients: Estimate Std. Error t value Pr(|t|) Habitat03.252335 0.731056 4.4488 5.115e-05 *** Habitat13.959158 0.782851 5.0574 6.629e-06 *** Location1 0.598868 0.810679 0.73870.4637 Habitat1:Location1 -0.078772 0.905514 -0.08700.9310 F-statistic: 3.744 on 4 and 48 DF, p-value: 0.009906 Also, the output of the above analysis depends on the order of variables (symbiont location ~ habitat VS habitat ~ symbiont location).. Once the effect of one variable is removed, the effect of the other one is no longer significant, likely because both are correlated. Is there an objective way to decide which one explains the best the body size ? Perhaps considering amount of variance explained by each variable individually ? Apologies for these very naive questions.. I guess answers are obvious to most of you but... Thanks in advance for your comments. Regards Julien Lorion PhD, Post-doctoral fellow of the Japan Society for the Promotion of Science Japan Agency for Marine-Earth Science and Technology (JAMSTEC) Marine Ecosystems Research Department 2-15 Natsushima, Yokosuka 237-0061 Japan Phone: +81-46-867-9570, Fax: +81-46-867-9525 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] PGLS, categorical data and regression through origin
Dear Julien, maybe I dont understand your rmodel...but IF your model has one continuous dep. and one categorical (binary) indep. it looks like an ANOVA model: in this case phy.anova() [or phy.manova() if you have 1 dependents] in the R package geiger does it. IF the model is different ...please explain better. Inverting the dependence-independence relationshipdepends on your hypothesis testing. When the categorical becomes the dependent you need to apply a phylogentic logistic regression (in the case of binary) or multinomial logistic (I think MCMCglmm does it). IF you have more factor variables as dependent...applying comparative methods maybe is more complicated but a trick could be useful. Say you have a two-levels and a four levels factor variables. You can test * pairwise *(!!) your (**CONTINUOUS**!!) dependent against a new factor where you coded any possible level identifiable by all occurring combinations of the levels of the two factor variables. Not necessarily there will be a n°levels equal to the n°levels of fisrt factor variable * n°levels of the second one: it depends from real data. But you wrote: symbiont location ~ habitat VS habitat ~ symbiont location hereall things are categoricalis it?...Where is the body size? Best Paolo Dear colleagues, I am testing the impact of categorical binary characters (habitat and presence/absence of symbionts) on a continuous variable (log of body size) using PGLS... I am not sure if I should remove the intercept from the formulae and the biological interpretation of the absence of intercept for categorical variables.. All papers I found on the issue of regression through origin were about PIC and continuous characters It does not change my conclusions when I test individually each variable (BOTH have a HIGHLY significant impact on body size), but it does when I test them simultaneously: Coefficients: Estimate Std. Error t value Pr(|t|) (Intercept) 3.252335 0.731056 4.4488 5.115e-05 *** Habitat10.706823 0.434013 1.6286 0.1099 Location1 0.598868 0.810679 0.7387 0.4637 Habitat1:Location1 -0.078772 0.905514 -0.0870 0.9310 F-statistic: 3.744 on 4 and 48 DF, p-value: 0.009906 Coefficients: Estimate Std. Error t value Pr(|t|) Habitat03.252335 0.731056 4.4488 5.115e-05 *** Habitat13.959158 0.782851 5.0574 6.629e-06 *** Location1 0.598868 0.810679 0.7387 0.4637 Habitat1:Location1 -0.078772 0.905514 -0.0870 0.9310 F-statistic: 3.744 on 4 and 48 DF, p-value: 0.009906 Also, the output of the above analysis depends on the order of variables (symbiont location ~ habitat VS habitat ~ symbiont location).. Once the effect of one variable is removed, the effect of the other one is no longer significant, likely because both are correlated. Is there an objective way to decide which one explains the best the body size ? Perhaps considering amount of variance explained by each variable individually ? Apologies for these very naive questions.. I guess answers are obvious to most of you but... Thanks in advance for your comments. Regards Julien Lorion PhD, Post-doctoral fellow of the Japan Society for the Promotion of Science Japan Agency for Marine-Earth Science and Technology (JAMSTEC) Marine Ecosystems Research Department 2-15 Natsushima, Yokosuka 237-0061 Japan Phone: +81-46-867-9570, Fax: +81-46-867-9525 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo -- Paolo Piras Center for Evolutionary Ecology and Dipartimento di Scienze Geologiche, Università Roma Tre Largo San Leonardo Murialdo, 1, 00146 Roma Tel: +390657338000 email: ppi...@uniroma3.it ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Question about Kcalc with 1 data points per species
Hi Jonathan. The thing to do in this case is to estimate K with within-species variability. This is described in Ives et al. (2007; Syst. Biol.) and implemented in the phytools function phylosig. This will give an unbiased estimate of K. (K estimated when within-species variability is ignored is downwardly biased.) To do this, you need the species means for each species and a vector of the standard error of the means. If we cannot estimate the standard error of the mean for some species (because n=1), then we can just use the mean variance. To get the means and standard errors (assuming your raw data is in a vector, x, with species names), we can just do the following: # get the mean by species temp-aggregate(x,by=list(names(x)),mean) xbar-temp[,2]; names(xbar)-temp[,1] # get the variance by species temp-aggregate(x,by=list(names(x)),var) xvar-temp[,2]; names(xvar)-temp[,1] # replace NA with mean variance xvar[is.na(xvar)]-mean(xvar,na.rm=TRUE) # get the N per species n-as.vector(table(names(y))) # compute the standard errors se-sqrt(xvar/n) # compute K K-phylosig(tree,xbar,se=se) Hopefully that works for you. All the best, Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://phytools.blogspot.com On 6/6/2012 6:12 PM, Jonathan Benstead wrote: Dear r-sig-phylo members - Colleagues and I came across an old post about calculating the Blomberg K statistic in studies where there is more than one data point for some species, by using a tree that includes zero-branch lengths for those multiple studies. This seems ideal for our data, which has several species with up to 14 data points for trait values. We have prepared a tree with the necessary zero-branch lengths and our tip labels match our row labels. However, when I run the code I get the following error message: Error in Initialize.corSymm(X[[1L]], ...) : Initial values for corSymm must be between -1 and 1 I found two references to this problem among old posts, but nothing that helped solve the problem, except that it might be something to do with the tree. As someone else has pointed out, the problem goes away if we change the zero branch lengths to a positive number, but we don't want to give these branches an arbitrary length. Is there another way to do this? Here's our code: library(ape) library(picante) Fish = read.table(Fish.csv, header = TRUE, sep = ,) AuthP2-as.matrix(as.matrix(Fish)[,1]) tree1 = read.nexus(consensus_constraint_polytomies.nex) tree2 = drop.tip(tree1, Erpetoichthys_calabaricus_AY442348_) names(AuthP2)-tree2$tip.label Kcalc(AuthP2[tree2$tip.label], tree2) Any help getting this analysis to work would be greatly appreciated. Jon Jon Benstead Associate Professor Aquatic Biology Program University of Alabama, Box 870206 1124 Bevill Building, 201 7th Ave. Tuscaloosa, Alabama 35487-0206 Lab homepage: http://bama.ua.edu/~jbenstead Iceland blog: http://icelandstreams.blogspot.com/ Tel: 205-348-9034 Fax: 205-348-1403 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo