Re: [R-sig-phylo] issue with reading matrix in R package DispRity
Hi Oliver, You have semicolons ; in your cells. csv is "comma separated format". It is using "," as the delimiter. Probably what is happening is the ";" is interpreted as another delimiter in R, so you have a 2 column vector in each cell. I would guess that htis will be read in as a list. Why do you have two values in each cell? If they are important I would suggest splitting them into two columns so that you only have one value per cell. Otherwise you can read in the list, and flatten it with code. But it would probably be easier to just produce the .csv in the correct way. An easy way to do this is to read in your .csv file with a word processor and just do a search on ";" and replace with ",". Then you should get 6 columns instead of three columns with two values each. Marguerite Marguerite On Mon, Dec 6, 2021 at 2:08 PM Oliver Betz wrote: > > Dear all: > > I have an issue in correctly transferring the attached data frame to a > matrix to be processed in the R package DispRity. > > I am running the following script that (1) reads in the attached > dataframe from Excel, (2) tansfers it to a matrix X and (3) uses the > column "subgroup" to define the two groups I want to work on. > > Her is the script I am using: > > #___ > > library (dispRity) > > #(1) Reading dataframe > Data<-read.csv2("test.csv", row.names=1, header = T, > fileEncoding="UTF-8-BOM") > > #(2) converts dataframe into a matrix > X<-data.matrix(Data) > > #(3)define subsets > custom.subsets(X[, c(1,2)], group = as.factor(X[, 3])) > > #___ > > However, once I have run the third line, I am getting an error message > such as: > > Error: group argument must be either a "list", a "matrix", a > "data.frame" or a "phylo" object. > > > > If I change the third line into: custom.subsets(dat[, c(2,3)], group = > as.factor(dat[, 4])), the error message is as follows: > > Error in X[, 4] : subscript out of bounds > ___ > > > I will be very happy if anybody can give a hint how to resolve this > problem. > > > My best wishes, > > Oliver Betz > > > > > > > > > > > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > -- Marguerite A. Butler Professor Department of Biology 2538 McCarthy Mall, Edmondson Hall 216 Honolulu, HI 96822 Office: 808-956-4713 Dept: 808-956-8617 Lab: 808-956-5867 FAX: 808-956-4745 http://butlerlab.org http://manoa.hawaii.edu/biology/people/marguerite-butler http://www2.hawaii.edu/~mbutler [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] If my trait X cannot be regressed by body size, how can I rescue residuals corrected by the phylogeny and SE?
iling list - R-sig-phylo@r-project.org > > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > -- Marguerite A. Butler Professor Department of Biology 2538 McCarthy Mall, Edmondson Hall 216 Honolulu, HI 96822 Office: 808-956-4713 Dept: 808-956-8617 Lab: 808-956-5867 FAX: 808-956-4745 http://butlerlab.org http://manoa.hawaii.edu/biology/people/marguerite-butler http://www2.hawaii.edu/~mbutler [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] How to sort trait data according to tree
Emmanuel, Yes, that was exactly what I was confused about (even str(cbm) wasnʻt very illuminating). Thank you so much for your explanation. CorBrownian is only storing the formula and the tree, and the data are gathered by gls. I agree this is a very good way to keep everything in its proper order. Thanks again for your kind replies, Marguerite On Sun, May 16, 2021 at 8:58 PM Emmanuel Paradis wrote: > - Le 17 Mai 21, à 13:41, Marguerite Butler a > écrit : > > Thank you very much for the reply Emmanuel! > > OK, yes I just tried and Iʻm surprised that this works: > > > dat <- data.frame(Species = c("Homo", "Pongo", "Macaca", "Ateles", > "Galago"), X = X, Y = Y) > > cbm <- corBrownian(1, tree, form = ~Species) > > gls(Y ~ X, dat, correlation = cbm) > > How does corBrownian know about Species, when it is only defined as an > element of dat, and not known in the global environment? > > corBrownian() doesn't know where all these variables are located. Indeed, > if you print 'cbm': > > R> cbm > Uninitialized correlation structure of class corBrownian > > It's gls() that looks for them. When you think about it, using "data = > ..." is really a good way to be sure that all variables are ordered > correctly whether they are involved as predictors (in 'model = '), in the > correlation structure, or in the variance function ('weights = '). > > Best, > > Emmanuem > > Anyway, thank you! > Marguerite > > > On Sun, May 16, 2021 at 7:03 PM Emmanuel Paradis > wrote: > >> Hi Marguerite, >> >> The issue is about nlme::gls. The help page ?gls says: >> >> data: an optional data frame containing the variables named in >> ‘model’, ‘correlation’, ‘weights’, and ‘subset’. By default >> the variables are taken from the environment from which ‘gls’ >> is called. >> >> So you make it work by removing "dat$" in the call to corBrownian(): >> >> cbm <- corBrownian(1, tree, form = ~Species) >> >> Best, >> >> Emmanuel >> >> - Le 17 Mai 21, à 6:22, Marguerite Butler mbutler...@gmail.com a >> écrit : >> >> > Aloha all, >> > >> > Does anyone know the answer to this question - how does ape pass the >> > species name information into gls? Seems to be a global environment >> issue >> > when using apeʻs corBrownian and nlmeʻs gls? and a very specific form >> is >> > required, but why? >> > >> > Modified from ape help page for CorClasses: >> > >> > --- >> > >> > library(nlme) >> > library(ape) >> > >> > tree <- read.tree(text = >> > >> "Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);") >> > X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968) >> > Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259) >> > dat <- data.frame(Species = c("Homo", "Pongo", "Macaca", "Ateles", >> > "Galago"), X = X, Y = Y) >> > >> > gls(Y ~ X, dat, correlation=corBrownian(1, tree.primates, form = >> ~Species)) >> > --- >> > >> > The above works just fine. However, if we try to calculate corBrownian >> > separately, gls cannot understand the corBrownian object: >> > --- >> >> cbm <- corBrownian(1, tree, form = ~dat$Species)# calculate >> correlation >> >> separately >> >> gls(Y ~ X, dat, correlation = cbm) ## does not work, invalid >> type >> >> (list) for dat >> > Error in model.frame.default(formula = ~dat + Species + Y + X, data = >> list( : >> > invalid type (list) for variable 'dat' >> > --- >> > >> > If we put the species information into a vector in the global >> > environment it works. Why canʻt we use dat$Species? >> > --- >> > spp <- dat$Species # save species as vector in global environment >> > cbm <- corBrownian(1, tree, form = ~spp)# calculate correlation >> separately >> > gls(Y ~ X, dat, correlation = cbm) ## works! >> > >> > Thanks, >> > >> > Marguerite >> > >> > >> > >> > On Fri, May 14, 2021 at 10:21 PM Marguerite Butler < >> mbutler...@gmail.com> >> > wrote: >> > >> >> Aloha Oliver, >> >> >> >> From the cor.Brownian help page, the explanation for the form argument >&
Re: [R-sig-phylo] How to sort trait data according to tree
Thank you very much for the reply Emmanuel! OK, yes I just tried and Iʻm surprised that this works: > dat <- data.frame(Species = c("Homo", "Pongo", "Macaca", "Ateles", "Galago"), X = X, Y = Y) > cbm <- corBrownian(1, tree, form = ~Species) > gls(Y ~ X, dat, correlation = cbm) How does corBrownian know about Species, when it is only defined as an element of dat, and not known in the global environment? Anyway, thank you! Marguerite On Sun, May 16, 2021 at 7:03 PM Emmanuel Paradis wrote: > Hi Marguerite, > > The issue is about nlme::gls. The help page ?gls says: > > data: an optional data frame containing the variables named in > ‘model’, ‘correlation’, ‘weights’, and ‘subset’. By default > the variables are taken from the environment from which ‘gls’ > is called. > > So you make it work by removing "dat$" in the call to corBrownian(): > > cbm <- corBrownian(1, tree, form = ~Species) > > Best, > > Emmanuel > > - Le 17 Mai 21, à 6:22, Marguerite Butler mbutler...@gmail.com a > écrit : > > > Aloha all, > > > > Does anyone know the answer to this question - how does ape pass the > > species name information into gls? Seems to be a global environment issue > > when using apeʻs corBrownian and nlmeʻs gls? and a very specific form is > > required, but why? > > > > Modified from ape help page for CorClasses: > > > > --- > > > > library(nlme) > > library(ape) > > > > tree <- read.tree(text = > > > "Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);") > > X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968) > > Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259) > > dat <- data.frame(Species = c("Homo", "Pongo", "Macaca", "Ateles", > > "Galago"), X = X, Y = Y) > > > > gls(Y ~ X, dat, correlation=corBrownian(1, tree.primates, form = > ~Species)) > > --- > > > > The above works just fine. However, if we try to calculate corBrownian > > separately, gls cannot understand the corBrownian object: > > --- > >> cbm <- corBrownian(1, tree, form = ~dat$Species)# calculate > correlation > >> separately > >> gls(Y ~ X, dat, correlation = cbm) ## does not work, invalid > type > >> (list) for dat > > Error in model.frame.default(formula = ~dat + Species + Y + X, data = > list( : > > invalid type (list) for variable 'dat' > > --- > > > > If we put the species information into a vector in the global > > environment it works. Why canʻt we use dat$Species? > > --- > > spp <- dat$Species # save species as vector in global environment > > cbm <- corBrownian(1, tree, form = ~spp)# calculate correlation > separately > > gls(Y ~ X, dat, correlation = cbm) ## works! > > > > Thanks, > > > > Marguerite > > > > > > > > On Fri, May 14, 2021 at 10:21 PM Marguerite Butler > > > wrote: > > > >> Aloha Oliver, > >> > >> From the cor.Brownian help page, the explanation for the form argument > is > >> this: > >> > >> a one sided formula of the form ~ t, or ~ t | g, specifying the taxa > >> covariate t and, optionally, a grouping factor g. A covariate for this > >> correlation structure must be character valued, with entries matching > the > >> tip labels in the phylogenetic tree. When a grouping factor is present > in > >> form, the correlation structure is assumed to apply only to observations > >> within the same grouping level; observations with different grouping > levels > >> are assumed to be uncorrelated. Defaults to ~ 1, which corresponds to > using > >> the order of the observations in the data as a covariate, and no groups. > >> > >> This means that "t" should contain the species names or tip labels, > >> spelled exactly as they are in your phylogeny. So for example if your > data > >> SteninaeData > >> has a column for species names called "species" then you would specify: > >> > >> cor.BM <- corBrownian(phy=SteninaeTree, form = ~dat$species) > >> > >> and then proceed with your regression model as above. > >> > >> > >> You can also follow the example on the corClasses page: > >> http://127.0.0.1:27386/library/ape/html/corClasses.html > >> > >> Slightly modified here: > >> &
Re: [R-sig-phylo] How to sort trait data according to tree
Aloha all, Does anyone know the answer to this question - how does ape pass the species name information into gls? Seems to be a global environment issue when using apeʻs corBrownian and nlmeʻs gls? and a very specific form is required, but why? Modified from ape help page for CorClasses: --- library(nlme) library(ape) tree <- read.tree(text = "Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);") X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968) Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259) dat <- data.frame(Species = c("Homo", "Pongo", "Macaca", "Ateles", "Galago"), X = X, Y = Y) gls(Y ~ X, dat, correlation=corBrownian(1, tree.primates, form = ~Species)) --- The above works just fine. However, if we try to calculate corBrownian separately, gls cannot understand the corBrownian object: --- > cbm <- corBrownian(1, tree, form = ~dat$Species)# calculate correlation > separately > gls(Y ~ X, dat, correlation = cbm) ## does not work, invalid type > (list) for dat Error in model.frame.default(formula = ~dat + Species + Y + X, data = list( : invalid type (list) for variable 'dat' --- If we put the species information into a vector in the global environment it works. Why canʻt we use dat$Species? --- spp <- dat$Species # save species as vector in global environment cbm <- corBrownian(1, tree, form = ~spp)# calculate correlation separately gls(Y ~ X, dat, correlation = cbm) ## works! Thanks, Marguerite On Fri, May 14, 2021 at 10:21 PM Marguerite Butler wrote: > Aloha Oliver, > > From the cor.Brownian help page, the explanation for the form argument is > this: > > a one sided formula of the form ~ t, or ~ t | g, specifying the taxa > covariate t and, optionally, a grouping factor g. A covariate for this > correlation structure must be character valued, with entries matching the > tip labels in the phylogenetic tree. When a grouping factor is present in > form, the correlation structure is assumed to apply only to observations > within the same grouping level; observations with different grouping levels > are assumed to be uncorrelated. Defaults to ~ 1, which corresponds to using > the order of the observations in the data as a covariate, and no groups. > > This means that "t" should contain the species names or tip labels, > spelled exactly as they are in your phylogeny. So for example if your data > SteninaeData > has a column for species names called "species" then you would specify: > > cor.BM <- corBrownian(phy=SteninaeTree, form = ~dat$species) > > and then proceed with your regression model as above. > > > You can also follow the example on the corClasses page: > http://127.0.0.1:27386/library/ape/html/corClasses.html > > Slightly modified here: > > --- > > library(nlme) > > library(ape) > > tree <- > read.tree(text="Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);" > > X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968) # a vector > > Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259) # a vector > > Species <- c("Homo", "Pongo", "Macaca", "Ateles", "Galago") # a vector > > dat <- data.frame(Species = Species, X = X, Y = Y)# make a dataframe > with Species, X, Y, can also read in .csv > cbm <- corBrownian(1, tree, form = ~dat$Species) > > m1 <- gls(Y ~ X, dat, correlation=cbm) > > m1 > > --- > > hth, > > Marguerite > > On Fri, May 14, 2021 at 9:46 PM Oliver Betz > wrote: > >> Dear all: >> >> I am currently facing the following problem: >> >> Unsing ape, nlme por phylolm for PGLS I was wondering how to sort my >> trait data (the ones in my csv file) according to the tree, so that >> both are in the same order. I learned how to check whether the data & >> the tree coincide in both number of species and names, but I do not >> nderstand how to perform the mentioned sorting task. >> >> Once I am running the PGLS commands such as >> cor.BM <- corBrownian(phy=SteninaeTree) >> pgls <- gls(LOG_rel_Attachment_force_smooth ~ >> LOG_rel_Anzahl_Hafthaare_4.Vordertarsus, correlation = cor.BM, data = >> SteninaeData, method = "ML")" >> >> I regularly receive the follwing warning note: >> "In Initialize.corPhyl(X[[i]], ...) : >> No covariate specified, species will be taken as ordered in the data >> frame. To avoid this message, specify a covariate containing the >> species names with the 'form' argument." >> >> Since this is just a
Re: [R-sig-phylo] How to sort trait data according to tree
Aloha Oliver, >From the cor.Brownian help page, the explanation for the form argument is this: a one sided formula of the form ~ t, or ~ t | g, specifying the taxa covariate t and, optionally, a grouping factor g. A covariate for this correlation structure must be character valued, with entries matching the tip labels in the phylogenetic tree. When a grouping factor is present in form, the correlation structure is assumed to apply only to observations within the same grouping level; observations with different grouping levels are assumed to be uncorrelated. Defaults to ~ 1, which corresponds to using the order of the observations in the data as a covariate, and no groups. This means that "t" should contain the species names or tip labels, spelled exactly as they are in your phylogeny. So for example if your data SteninaeData has a column for species names called "species" then you would specify: cor.BM <- corBrownian(phy=SteninaeTree, form = ~dat$species) and then proceed with your regression model as above. You can also follow the example on the corClasses page: http://127.0.0.1:27386/library/ape/html/corClasses.html Slightly modified here: --- library(nlme) library(ape) tree <- read.tree(text="Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);" X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968) # a vector Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259) # a vector Species <- c("Homo", "Pongo", "Macaca", "Ateles", "Galago") # a vector dat <- data.frame(Species = Species, X = X, Y = Y)# make a dataframe with Species, X, Y, can also read in .csv cbm <- corBrownian(1, tree, form = ~dat$Species) m1 <- gls(Y ~ X, dat, correlation=cbm) m1 --- hth, Marguerite On Fri, May 14, 2021 at 9:46 PM Oliver Betz wrote: > Dear all: > > I am currently facing the following problem: > > Unsing ape, nlme por phylolm for PGLS I was wondering how to sort my > trait data (the ones in my csv file) according to the tree, so that > both are in the same order. I learned how to check whether the data & > the tree coincide in both number of species and names, but I do not > nderstand how to perform the mentioned sorting task. > > Once I am running the PGLS commands such as > cor.BM <- corBrownian(phy=SteninaeTree) > pgls <- gls(LOG_rel_Attachment_force_smooth ~ > LOG_rel_Anzahl_Hafthaare_4.Vordertarsus, correlation = cor.BM, data = > SteninaeData, method = "ML")" > > I regularly receive the follwing warning note: > "In Initialize.corPhyl(X[[i]], ...) : > No covariate specified, species will be taken as ordered in the data > frame. To avoid this message, specify a covariate containing the > species names with the 'form' argument." > > Since this is just a warning message, I usually was going on with the > "summary(pgls)" command, where I was getting my results. > > > Anyhow, I was wondering whether it is OK to just ignore this warning > or whether it means that I will have to sort the sequence of the > sepcies trait data in my data file (csv) exactly according to the > sequence of the species as they occur at the tips of my tree file. > Since the spelling of the species names is exactly the same in both > the tree file and the data file, I thought that such sorting is > automatically done by the algorithm? > > I will be glad if you have any hint how ro resove this. > > My best wishes, > > Oliver Betz > (University of Tübingen, Gernmany) > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > -- ________ Marguerite A. Butler Professor Department of Biology 2538 McCarthy Mall, Edmondson Hall 216 Honolulu, HI 96822 Office: 808-956-4713 Dept: 808-956-8617 Lab: 808-956-5867 FAX: 808-956-4745 http://butlerlab.org http://manoa.hawaii.edu/biology/people/marguerite-butler http://www2.hawaii.edu/~mbutler [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] units of sigsq
Aloha all, Well if all youʻre doing is plotting data, then sure, you can have a change of axes and something in g will still be in g with log-transformed axes. But does this apply to fitting models of BM or OU? The model is assuming that the "errors" or the small jumps in phenotype comes from a normal distribution. So by fitting a model on log-transformed data will assume that those changes come from a different distribution, on log(g). This has potentially deep implications, suggesting that the magnitude of the changes in the original scale would be larger for large values of g that for small values of g when on a log-scale. That would imply basically that itʻs easier to get a lot larger or a lot smaller in a single step if youʻre already big. This might make sense especially for genome size evolution, for example, where big changes in size arise by chromosomal duplication or transposable element activity, etc. Anyway, log-transformation is commonly applied to biological data. As for the units of time, yes I agree with Joe - it is the units that the depth of the tree is in. If it is time calibrated, then it is time. Otherwise it is generally mutations per branch length. Marguerite On Fri, Mar 19, 2021 at 2:20 PM Joe Felsenstein wrote: > Marguerite asked: > > > First - Joe - what do you mean by log(grams) has no units? The units of > grams is a unit, so log(mass) will have units of log-gm. As log is not the > same as 1/gm, log(gm) cannot be unit-free. > > I looked it up on Wikipedia, and was assured by it that Marguerite is > right, log(weight) has the same units as weight, except you're > supposed to call them log-gm. > > > I do think that unless there is a calibration with time, the tree > branch lengths are not in units of time but in units of base > substitutions per site. > > And I remain confused on what the units are, if you compute a linear > combination such as 2 log(wt) - 3 log(height). Which princip[al, le] > components machinery does. > > Joe > -- > Joe Felsenstein felse...@gmail.com, j...@gs.washington.edu > Department of Genome Sciences and Department of Biology, > University of Washington, Box 355065, Seattle, WA 98195-5065 USA > -- Marguerite A. Butler Professor Department of Biology 2538 McCarthy Mall, Edmondson Hall 216 Honolulu, HI 96822 Office: 808-956-4713 Dept: 808-956-8617 Lab: 808-956-5867 FAX: 808-956-4745 http://butlerlab.org http://manoa.hawaii.edu/biology/people/marguerite-butler http://www2.hawaii.edu/~mbutler [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] units of sigsq
Aloha all, Iʻm still reeling from the Atlanta murders and the rise of hate in general, so I may not be thinking straight, but if weʻre talking about Brownian motion, Iʻm not sure this is quite right. If the trait is log(grams) then the trait is unit-free. > First - Joe - what do you mean by log(grams) has no units? The units of grams is a unit, so log(mass) will have units of log-gm. As log is not the same as 1/gm, log(gm) cannot be unit-free. > The "time" > is probably branch length from a phylogeny. That in turn (from DNA > data) is usually DNA substitutions per site. > > So the units of the standard deviation aresites per substitution. > But this is not the standard deviation, it is its square. > > So (wait for it ...)square sites per square substitution > We worked out the units of BM and OU parameters in Cressler et al (2015). We parameterized it this way. If we have a trait X evolving under a Brownian motion, we can write the change in X per unit time as: dX(t) = sigma*dB(t) Where dX(t) (the change in X) will have the same units as X (in this case log-gm). and dB or a draw from the white noise distribution has units of time to the 1/2 power. Therefore sigma must have units of trait*time^(-1/2), and sigma^2 would have units of trait^2/time, or we will not obtain the correct units for X. Ted - yes, I agree with Joe - the matrix version of sigma and sigma^2 on a element-by-element basis will have the same units as the univariate case, except that you will have to substitute the units of each trait, as appropriate, if they are measured in different units. Marguerite > So (wait for it ...)square sites per square substitution > > Now that is pretty weird. But years ago people pointed out to me that > quantitative geneticists were accustomed to inferring variance > components of crop yield. The yield might be in bushels per acre. > So the units of its variance was: square bushels per square acre. > Don't even try to think about how you square a bushel, or how many > dimensions you have to go into to square an acre. Actually, you can > think about them: a bushel is three-dimensional volume, and an acre is > two dimensional area. So crop yield has units of meters, and > variance of crop yield should have units of square meters. > > That way lies madness ... > > Joe > - > Joe Felsenstein felse...@gmail.com, j...@gs.washington.edu > Department of Genome Sciences and Department of Biology, > University of Washington, Box 355065, Seattle, WA 98195-5065 USA > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > -- Marguerite A. Butler Professor Department of Biology 2538 McCarthy Mall, Edmondson Hall 216 Honolulu, HI 96822 Office: 808-956-4713 Dept: 808-956-8617 Lab: 808-956-5867 FAX: 808-956-4745 http://butlerlab.org http://manoa.hawaii.edu/biology/people/marguerite-butler http://www2.hawaii.edu/~mbutler [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] model averaging for discrete character evolution
). > >> > >> Best, > >> Jake > >> > >> [[alternative HTML version deleted]] > >> > >> ___ > >> R-sig-phylo mailing list - R-sig-phylo@r-project.org R-sig-phylo@r-project.org> > >> > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylodata=02%7C01%7Cliam.revell%40umb.edu%7Ce85015cb8c314c684bdc08d71b4c8fee%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637007887976647805sdata=F8qdVoT17J%2BuC6wgvFktdI1id%2Bf37hDLe1W17z6OhTw%3Dreserved=0 > < > https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylodata=02%7C01%7Cliam.revell%40umb.edu%7Ce85015cb8c314c684bdc08d71b4c8fee%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637007887976647805sdata=F8qdVoT17J%2BuC6wgvFktdI1id%2Bf37hDLe1W17z6OhTw%3Dreserved=0 > > > >> Searchable archive at > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2Fdata=02%7C01%7Cliam.revell%40umb.edu%7Ce85015cb8c314c684bdc08d71b4c8fee%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637007887976657803sdata=ogCaep2DN92bk6%2FZwOOcpmHaCPQwX7BzNgKo2UMcWtw%3Dreserved=0 > < > https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2Fdata=02%7C01%7Cliam.revell%40umb.edu%7Ce85015cb8c314c684bdc08d71b4c8fee%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637007887976657803sdata=ogCaep2DN92bk6%2FZwOOcpmHaCPQwX7BzNgKo2UMcWtw%3Dreserved=0 > > > >> > > > [[alternative HTML version deleted]] > > _______ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > -- Marguerite A. Butler Professor Department of Biology 2538 McCarthy Mall, Edmondson Hall 216 Honolulu, HI 96822 Office: 808-956-4713 Dept: 808-956-8617 Lab: 808-956-5867 FAX: 808-956-4745 http://butlerlab.org http://manoa.hawaii.edu/biology/people/marguerite-butler http://www2.hawaii.edu/~mbutler [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] OU for non-ultrametric trees
Aloha Danielle, There is no reason at all (mathematically) why you canʻt run non-ultrametric trees. ouch has ever been limited this way in its functions, and it also does not require bifurcating trees. Our original comment was that in most cases people will want to interpret their results in units of time, which is implied by an ultra metric tree. However, if there is some other metric that is more useful for the problem at hand, such as units of mutational change, that is fine too. This may be more relevant for virus evolution. The important thing is that the branch lengths have some meaning, and users should put some thought into the interpretation of the model output. Marguerite On Sun, Dec 16, 2018 at 3:09 AM Danielle Miller wrote: > First, I would like to thank all for your replies. > > Second, I have a tree constructed of many viral genome sequences. I use > midpoint rooting in order to run the model on a rooted tree (not using > outgroup) and do not have three ways splits. > I’ve probably read this in a paper mentioning that in order to fit OU one > needs an ultrametric tree (pre 2014). > > Looking at the models equations I did not fully understand way this is a > valid assumption, and wanted to make sure I’m not missing anything > important. > > Once again, many thanks, > Danelle > > > On 12 Dec 2018, at 23:18, David Bapst wrote: > > > > Danielle, > > > > Apologies for the late comment on this, but what do you mean by > non-rooted? Do you mean the tree is rooted, but has a three-way split with > the outgroup at the root node? You shouldn't do OU on a tree you can't > assign a root to. > > > > I'm also just a bit curious what gave you the sense that OU was > inappropriate for non-ultrametric trees. To my knowledge, this was only an > issue in one case, as highlighted by Slater (2014), that one particularly > (popular) algorithm for fitting evolutionary models doesn't work for OU on > non-ultrametric trees. I don't think this is as much of a concern in 2018, > as I think most/all packages that were doing such have fixed their code so > an alternative algorithm is used when a non-ultrametric tree is given (as > with a tree containing fossil taxa). > > > > Cheers, > > -Dave > > > > > > On Thu, Dec 6, 2018 at 6:35 AM Danielle Miller <mailto:danimiller...@gmail.com>> wrote: > > Hi all, > > > > I have a non-rooted non-ultrametric tree and a corresponding set of a > single trait values for each one of the tips. > > I’m interested whether it follows a BM model or an OU. > > > > Reading previous comments in the archive, I understood that running an > OU process is inadequate in a case of non-ultrametric trees, however I did > not fully understand why. > > As I use ouch package, what are the consequences of running an unrooted > tree? And a non-ultrametric one? > > > > What would be the best way assessing this issue? Diversitree should be > more reliable? > > > > I will appreciate any explanation and help, > > Danielle Miller > > ___ > > R-sig-phylo mailing list - R-sig-phylo@r-project.org R-sig-phylo@r-project.org> > > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo < > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo> > > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ < > http://www.mail-archive.com/r-sig-phylo@r-project.org/> > > > > > > -- > > David W. Bapst, PhD > > Asst Research Professor, Geology & Geophysics, Texas A & M University > > Postdoc, Ecology & Evolutionary Biology, Univ of Tenn Knoxville > > https://github.com/dwbapst/paleotree < > https://github.com/dwbapst/paleotree> > > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > -- Marguerite A. Butler Professor Department of Biology 2538 McCarthy Mall, Edmondson Hall 216 Honolulu, HI 96822 Office: 808-956-4713 Dept: 808-956-8617 Lab: 808-956-5867 FAX: 808-956-4745 http://butlerlab.org http://manoa.hawaii.edu/biology/people/marguerite-butler http://www2.hawaii.edu/~mbutler [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] estimate ancestral state with OUwie models
Aloha all, There is no requirement for an ultra metric tree in the formulae reported in Butler-King 2004. Interested investigators should in particular read the supplementary materials where the mathematical details are worked out. We do generally use ultrametric trees because as comparative biologists, it is more straightforward to think about evolution in units of time rather than in terms of mutational units, etc. However this is by choice, not any methodological limitation. Once the model parameters are found, the phylogenetic variance-covariance matrix defined by the alpha, thetas, and sigmas can be used to compute ancestral states using a weighted least squares reconstruction method (instead of the typical BM var-cov matrix). The mapping of the alphas, thetas, and sigmas onto the tree are incorporated into this V-COV matrix, so that accounts for the OU model. NOTES: 1) without knowing why you are doing this, I do feel compelled to warn you that it is unclear why one would want to estimate ancestral states for poorly-fitting models. Be careful! 2) I hope you realize that ancestral states are in general poorly estimated, even assuming the “correct” model. This is because there is less and less information to anchor the values as you get farther from the tips, similar to the root estimation problem described below. This issue was clearly exposed in Schluter et al 1997 (and less famously so in Butler and Losos 1997). These depressing results were among the motivations for developing model-fit approaches in the first place. 3) In 2008/2009 the algorithms in OUCH, SLOUCH, and possibly other methods have changed in the estimation of the value of the root state (X0) which is an internal calculation in fitting the model. Ho and Ane 2013, and Hansen et al 2008 both reported that the root state X0 is ill-defined (unless there are fossil data to anchor the value). This makes sense intuitively, as all of the information is from the tips, and the root is very far down the tree. A reasonable assumption is that it is distributed according to the stationary distribution of the OU process (X0 ~ N(theta(0), sigma^2/2*alpha) and this assumption is what these methods now employ. 4) Whatever you end up doing, do check for the robustness of your results with parametric bootstrap on your fitted models (a la Boettinger et al 2012). As many investigators have reported, these parameters can have large confidence intervals, and can covary with one another (being on a likelihood ridge, etc.). But do note that even when parameters may not be uniquely identifiable, it may still be possible to have robust model selection (see Cressler et al 2015). So perhaps you want to fit ancestral states to see if the different models give you the same states? IDK? So in short, yes, you can do it, with any number of methods. But why? If you can answer your biological question with methods that do not involve estimation of a parameter that is inherently fraught with error, it might be better to go another way. Bottom line - use caution and be thoughtful! I am sure if I have made any errors Aaron, Clay, or Thomas will help. Hope this helps Marguerite Schluter, D., T. Price, A. O. Mooers, and D. Ludwig. 1997. Likelihood of ancestor states in adaptive radiation. Evolution 51:1699–1711. Butler, M. A., and J. B. Losos. 1997. Testing for unequal amounts of evolution in a continuous character on dif- ferent branches of a phylogenetic tree using linear and squared-change parsimony: an example using Lesser Antillean Anolis lizards. Evolution 51:1623–1635. Hansen T.F., Pienaar J., Orzack S.H. 2008. A comparative method for studying adaptation to a randomly evolving environment. Evolution 62:1965–1977. Ho L.S.T., Ané C.. 2014. Intrinsic inference difficulties for trait evolution with Ornstein-Uhlenbeck models. Methods Ecol. Evol. 2:1133–1146. Cressler C., Butler M.A., and King A. A. (2015) Detecting adaptive evolution in phylogenetic comparative analysis using the Ornstein-Uhlenbeck model. Sys. Bio. 64(6):953-968. DOI: 10.1093/sysbio/syv043 Boettiger C., Coop G., Ralph P. 2012. Is your phylogeny informative? Measuring the power of comparative methods. Evolution 66: 2240–2251. Marguerite A. Butler Professor Department of Biology 2538 McCarthy Mall, Edmondson Hall 216 Honolulu, HI 96822 Office: 808-956-4713 Dept: 808-956-8617 Lab: 808-956-5867 FAX: 808-956-4745 http://butlerlab.org http://manoa.hawaii.edu/biology/people/marguerite-butler http://www2.hawaii.edu/~mbutler > On Jun 11, 2018, at 7:33 PM, Simone Blomberg wrote: > > This sounded wrong to me, as the OU process should be agnostic to the > dataset: There are no restrictions inherent in the OU process that apply > particularly to phylogenetic data, whether the tree is ultrametric or not. I > re-read Slater 2014 and it is clear that you can
Re: [R-sig-phylo] Interpretation of standard errors of parameter estimates in OUwie models
h errors it limit the biological inferences I > >> can make? I'm more interested in the relative thetas across regimes than on > >> the exact values (testing the prediction that birds in darker habitats tend > >> to adapt to darker plumages than birds in more illuminated habitats). > >> > >> I have attached a table averaging parameter estimates and errors from > >> models fitted across a posterior distribution of 100 simmaps for four > >> traits; and one exemplar fitted model from one trait in one of those > >> simmaps. > >> > >> Thanks a lot for any help, > >> > >> > >> *--* > >> *Rafael Sobral Marcondes* > >> PhD Candidate (Systematics, Ecology and Evolution/Ornithology) > >> > >> Museum of Natural Science <http://sites01.lsu.edu/wp/mns/ > >> <http://sites01.lsu.edu/wp/mns/>> > >> Louisiana State University > >> 119 Foster Hall > >> Baton Rouge, LA 70803, USA > >> > >> Twitter: @brown_birds <https://twitter.com/brown_birds > >> <https://twitter.com/brown_birds>> > >> > >> > >> ___ > >> R-sig-phylo mailing list - R-sig-phylo@r-project.org > >> <mailto:R-sig-phylo@r-project.org> > >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > >> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo> > >> Searchable archive at http://www.mail-archive.com/r- > >> <http://www.mail-archive.com/r-> > >> sig-ph...@r-project.org/ <http://sig-ph...@r-project.org/> > >> > >> > > > > > > -- > > William Gearty > > PhD Candidate, Paleobiology > > Department of Geological Sciences > > Stanford School of Earth, Energy & Environmental Sciences > > williamgearty.com <http://williamgearty.com/> > > > > [[alternative HTML version deleted]] > > > > ___ > > R-sig-phylo mailing list - R-sig-phylo@r-project.org > > <mailto:R-sig-phylo@r-project.org> > > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > > <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo> > > Searchable archive at > > http://www.mail-archive.com/r-sig-phylo@r-project.org/ > > <http://www.mail-archive.com/r-sig-phylo@r-project.org/> > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > <mailto:R-sig-phylo@r-project.org> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo> > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ > <http://www.mail-archive.com/r-sig-phylo@r-project.org/> > > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > <mailto:R-sig-phylo@r-project.org> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo> > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ > <http://www.mail-archive.com/r-sig-phylo@r-project.org/> Marguerite A. Butler Associate Professor Department of Biology 2538 McCarthy Mall, Edmondson Hall 216 Honolulu, HI 96822 Office: 808-956-4713 Dept: 808-956-8617 Lab: 808-956-5867 FAX: 808-956-4745 http://butlerlab.org http://manoa.hawaii.edu/biology/people/marguerite-butler http://www2.hawaii.edu/~mbutler [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] compar.ou
Hi Sandra and others, You can also assess confidence using parametric bootstrap, a procedure which we generally recommend for all users. ouch has built-in facilities to do so (the bootstrap() and simulate() functions in addition to update() ). I think there are examples in my tutorial. If not, let me know and I can send you another. Marguerite On Oct 29, 2013, at 6:08 AM, Cecile Ane a...@stat.wisc.edu wrote: FYI, we have some theory to explain why alpha has large standard errors and in which conditions. As Brian says, it comes with a flat likelihood with respect with alpha. http://dx.doi.org/10.1214/13-AOS1105 or http://www.stat.wisc.edu/~ane/publis/2013HoAne_AoS.pdf On 10/29/2013 09:46 AM, Brian O'Meara wrote: In at least the OUwie paper we spent a lot of time doing simulations to determine this empirically (this may have been examined in other papers, too, though none come to mind). Alpha can be estimated, but sometimes with scarily large standard errors (but not always). This property should hold for related methods (it's a property of the likelihood surface for these models). You can just plot the alpha values vs likelihood to get a sense of this surface for your dataset (ideally, estimating the other parameters for each fixed alpha, which is possible, though the kludge of holding the other parameters at their MLE and just varying alpha will give you a sense of the surface, though it's a bit non-conservative). We had code in OUwie to do a contour plot of the likelihood surface but took it out (a small licensing difference between akima (university license) and OUwie (GPL) led to OUwie being pulled from CRAN by the CRAN maintainers until we resolved the difference). Brian ___ Brian O'Meara Assistant Professor Dept. of Ecology Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Tue, Oct 29, 2013 at 10:21 AM, sandra goutte gou...@mnhn.fr wrote: Thank you Marguerite. Looking at OUwie and OUCH/SLOUCH, i see that alpha is estimated along the other parameters, whereas in Hansen 1997 and other papers it is suggested that this would lead to very large standard errors. Is that problem resolved in these functions? Best, Sandra. 2013/10/26 Marguerite Butler mbutler...@gmail.com Dear Sandra, You might also want to look at the papers that go along with slouch, ouch, and ouwie. Here are some pdfs, along with some tutorials. On my Rcompstart, the ouch stuff starts around page 165 Let me know if you need help with ouch. Marguerite On Oct 24, 2013, at 7:03 AM, sandra goutte gou...@mnhn.fr wrote: Dear list, My aim is to compare the fit of models for which *theta* is allowed to change at different nodes (different combinations of 1 ,2 or 3 nodes). I don't really understand the calculation of the deviance, but if i'm not mistaken the difference between the deviances of 2 models follows a *chi*square distribution with the difference of parameters in the models as degree of freedom. Now, how to compare two models with the same number of optimum changes but at different nodes? I am having a hard time understanding the details of the function with regards to Hansen's (1997) paper. What is the relationship between the deviance and the RSS from Hansen (1997)? Is it possible to calculate the RSS from the output of compar.ou ? I would appreciate any advice or insight on the subject! Sandra. -- PhD Student Muséum National d'Histoire Naturelle Département Systématique et Évolution USM 601 / UMR 7205 Origine, Structure et Évolution de la Biodiversité Reptiles Amphibiens - Case Postale 30 25 rue Cuvier F-75005 Paris Tel : +33 (0) 1 40 79 34 90 Mobile: +33 (0) 6 79 20 29 99 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ Marguerite A. Butler Associate Professor Department of Biology University of Hawaii 2538 McCarthy Mall, Edmondson Hall 216 Honolulu HI 96822 Office: 808-956-4713 Dept: 808-956-8617 Lab: 808-956-5867 FAX: 808-956-9812 http://www.hawaii.edu/zoology/faculty/butler.html http://www2.hawaii.edu/~mbutler http://www.hawaii.edu/zoology/ -- Cecile Ane Departments of Statistics and of Botany University of Wisconsin - Madison www.stat.wisc.edu/~ane/ CALS statistical consulting lab: www.cals.wisc.edu/calslab/stat_consulting.php ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman
Re: [R-sig-phylo] Question on ace ML reconstruction of discrete binary character
Oops. Sorry the citation is Schluter, Price, Mooers, Ludwig 1997. Likelihood of ancestor states in adaptive radiation. Evolution 51:1699-1711. This issue has been known for a long time. On Jul 30, 2013, at 6:51 AM, Marguerite Butler mbutler...@gmail.com wrote: Hi Tom, One thing to keep in mind is the information content of the data relative to what you are trying to infer. Basically, you have data only at the tips, but are trying to infer the state of the root deep in the tree. So therefore there is actually very little information being brought to bear on this problem. In this case, whatever answer you get will very strongly reflect the assumptions of the model that you apply and the structure of the tree. Put another way, if you were to construct error bars around this character state estimate, you would see that they are huge (See Moers et al. 1997 in Evolution). It sounds like you are expecting a linear parsimony reconstruction. Why not just use that? Your character does not change very much on the tree. This is basically what your ML answer is telling you anyway, more than 50% chance of red at the base. Marguerite On Jul 30, 2013, at 4:38 AM, Liam J. Revell liam.rev...@umb.edu wrote: Hi Tom. There is no reason to expect that the marginal ancestral state reconstructions at the root node (empirical Bayesian posterior probabilities) should match your tip frequencies or prior probabilities. Imagine the following scenario: you have one diverse clade comprising 50% of extant taxa that all diverged recently from a common ancestor share state B; whereas state A is found in all the other tips of the tree, some of which are in clades originating near the root. We would not expect posterior probabilities at the root node to mach the empirical frequencies of our state at the tips (50:50). In fact, we might expect that our reconstructed state at the root of the tree would be strongly A. In your specific case, state 0 is found in three clades that originate nearer to the root, whereas more nested clades are exclusively in state 1. This is why - in spite of its relative rarity across the tips of the tree - there is still some reasonable (PP~0.3) posterior probability under the model that the root is in state 0. This is not an error that needs to be corrected - it is just what your data, model, and tree tell us about the ancestral node of the phylogeny. All the best, Liam Liam J. Revell, Assistant Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 7/30/2013 6:10 AM, Tom Wenseleers wrote: Dear all, Many thanks for all your advice so far. I have now moved to using rayDISC in package corHMM to reconstruct marginal maximum likelihood ancestral state reconstructions, using the method of Maddison et al (2007) and FitzJohn et al (2009) to fix the prior probabilities at the root (setting it to the observed frequency at the tips doesn't change much). The code I have is library(ape) library(corHMM) tree=read.tree(http://www.kuleuven.be/bio/ento/temp/tree.tre;) data=read.csv(file=http://www.kuleuven.be/bio/ento/temp/data.csv;) rownames(data)=data[,1] ASR=rayDISC(tree,data,ntraits=1,charnum=1,model=ER,node.states=marginal,root.p=maddfitz) plot(tree, cex=0.6, show.tip.label=TRUE, ljoin=2,lend=2,label.offset=0.02) nodelabels(pie=ASR$states,piecol=c(white,red), cex=0.45) tiplabels(pch = 22, bg = ifelse(data[tree$tip.label, ][,2],red,white), col=black,adj = c(0.51, 0.5), cex = 0.6) I still get unusually low marginal ML values for the trait being 1 at the basal nodes though (ca. 0.7, which is very low considering that 89% of my species have the trait). Would anyone be able to offer advice on why one could get the reconstructed root ML value to be so much lower than the actual observed frequency of the trait at the tips, and what could be a solution to obtaining a more realistic ML reconstruction? (I also tried diversitree and phangorn, but they all give similar results) Cheers, Tom -Original Message- From: Jack Viljoen [mailto:javilj...@gmail.com] Sent: 30 July 2013 10:23 To: Tom Wenseleers Subject: Re: [R-sig-phylo] Question on ace ML reconstruction of discrete binary character Hello, Tom. I was just wondering if the higher uncertainty at the basal nodes isn't expected, particularly given the long branches descended from them? Since this is an ML estimate and not a Bayesian one, surely the concept of priors does not apply? My understanding is that ace() actually only estimates the root node and that other methods are required to properly estimate the states at the other nodes. I'm basing this on these posts from Liam Revell earlier this year: http://blog.phytools.org/2013/03/conditional-scaled-likelihoods-in-ace.html http://blog.phytools.org
Re: [R-sig-phylo] on evolutionary models
Dear Pas, I am wondering what the general aim of your study is? I agree with Ted, that you should be careful about your assumptions and the tools you are using. Ted's point that K is a descriptive statistic is an important one. K describes the expected divergence among tip traits assuming a model of Brownian motion, in particular whether it is more or less than expected via BM. So if you find that BM doesn't fit, and then try to use K on it, what exactly are you hoping to understand? This is a bit like the tail wagging the dog. In general, I really don't like the term phylogenetic signal because BM is the reference model, it puts all evolutionary comparative analysis in a BM-centric perspective. That is, the null expectation is that trait diversity is produced by a BM process, and so we have to look at everything else for any ecological adaptation or other evolutionary influences. In practice, we can find many examples where alternative models explain the data much better, so BM doesn't fit, but that does not mean that there is no phylogenetic signal (whatever that means). Evolutionary history may have had a huge influence, but it may be via an adaptive or other model where adaptation and the history of selection has also been influential. It forces artificial distinctions and frames of reference. Anyway, what are you using to assess performance? Please be sure to use appropriate statistics to assess model fit, and use descriptive statistics for description. But please do make sure that the interpretation makes sense. Best of luck, Marguerite On Jan 24, 2013, at 11:03 AM, Paolo Piras paolo.pi...@uniroma3.it wrote: Hi pas, Besides the other arguments, I think that one should test for phylogenetic signal AFTER trasforming the tree according to the best mode of evolution IF it is not brownian. Maybe someone else can add some more on this. best paolo Da: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] per conto di Theodore Garland Jr [theodore.garl...@ucr.edu] Inviato: giovedì 24 gennaio 2013 21.30 A: pasquale.r...@libero.it; r-sig-phylo@r-project.org Mailing-list Oggetto: Re: [R-sig-phylo] R: Re: From ClustalW2 Tree to Heat Map in R Hi Pas, K is just a descriptive statistic for tip data on a tree with some specified branch lengths. Obviously, if you change the branch lengths (or the topology or the tip data), then K will be different. How much different depends on the details. It is likely that certain branch-length transformations (whether purely statistical or designed to mimic something like an OU process) will tend to make K either larger or smaller, at least when checked cross a large number of examples. Somebody would need to do a lot of simulations to sort this out. Some related relevant things are here: Revell, L. J., L. J. Harmon, and D. C. Collar. 2008. Phylogenetic signal, evolutionary process, and rate. Syst. Biol. 57:591-601. In the original Blomberg et al. (2003) paper we avoided this issue entirely by only reporting K using whatever tree the original paper used. We wanted to be able to compare K values among traits and studies in some sort of simple and fair way. If lambda and OU transforms are leading to very different K values (how different?) that does not mean one K value is better than the other. But it does mean you would want to be careful if you tried to compare your K value with other studies that did not use lambda and/or OU transforms. Cheers, Ted Theodore Garland, Jr. Professor Department of Biology University of California, Riverside Riverside, CA 92521 Office Phone: (951) 827-3524 Wet Lab Phone: (951) 827-5724 Dry Lab Phone: (951) 827-4026 Home Phone: (951) 328-0820 Facsimile: (951) 827-4286 = Dept. office (not confidential) Email: tgarl...@ucr.edu http://www.biology.ucr.edu/people/faculty/Garland.html http://scholar.google.com/citations?hl=enuser=iSSbrhwJ Experimental Evolution: Concepts, Methods, and Applications of Selection Experiments. 2009. Edited by Theodore Garland, Jr. and Michael R. Rose http://www.ucpress.edu/book.php?isbn=9780520261808 (PDFs of chapters are available from me or from the individual authors) From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] on behalf of pasquale.r...@libero.it [pasquale.r...@libero.it] Sent: Thursday, January 24, 2013 12:14 PM To: r-sig-phylo@r-project.org Mailing-list Subject: [R-sig-phylo] R: Re: From ClustalW2 Tree to Heat Map in R Hi Gents and Lads, I have a very rapid question with a perhaps not-so-obvious reply. I'm in the process of testing a number of evolutionary models and estimating phylogenetic signal on a certain univariate data set. So something very basic and very simple. The point is, I found BM performs poorly as compared to OU (single
Re: [R-sig-phylo] Ouch v.2.8-2 on the same tree gives different results (maybe)
Dear Elena, If I understand you correctly, you are trying to fit a unique regime to each branch of the tree? This is not advisable, as there are more branches than there are more branches than there are terminal taxa (data points). So you are trying to fit many more parameters than you have data points. In a hansen model, there are parameters for the strength of selection (alpha) the magnitude of drift (sigma) and each optima (4 for a 3 taxon tree). That is 6 parameters for three data points. It is not at all surprising that you get non convergence or different results, as the problem is not identifiable. We recommend using the minimum number of selective regimes to describe the major features of the data. Generally there are only a few regimes for many species, with many branches of the tree in the same regime. Also you should avoid tinkering with the inner workings of S4 objects. You should use them via their accessors as given in the help pages. Please try the examples at the bottom of the included datasets: require(ouch) data(bimac) ?bimac data(anolis.ssd) ?anolis.ssd Marguerite On Sep 14, 2012, at 5:51 AM, Elena Grassi wrote: Hello, I'm a newbie in phylogenetic analysis so excuse me in advance for incorrect terms and so on (or if my question is too silly). I've recently started to work on a project for which I need to use Ornstein-Uhlenbeck models and I decided to use R and ouch since the documentation was great. For the first experiments I have been working on some toy trees but I'm getting puzzling results: the same tree, simply defined with different orders of the string which I use to represent it in the Newick format, when analyzed with the Hansen functions gives different results. I have tried different combinations and in some cases a process converges while the other one does not, in other cases what change are the chosen optimal sigma (and thus the loglikelihoods and so on). This happens when I use a regime with different levels for every node in the tree but not when I use a single regime. I'm pasting here an example of this phenomenon. It could be a stupid error on my side but I'm not able to spot it right now. Thanks, Elena Grassi library(ouch) library(ape) t1 - read.tree(text=(E:10,(D:5,C:5):5);) t2 - read.tree(text=((C:5,D:5):5,E:10);) out1 - ape2ouch(t1) out2 - ape2ouch(t2) regime1 - out@nodes regime1 - out1@nodes regime2 - rep(global, length(outree@nodes)) regime2 - rep(global, length(out2@nodes)) expr1 - c(NA, NA, 3, 5, 10) expr2 - c(NA, NA, 10, 5, 3) models_data1 - data.frame(expr1, regime1, regime2, row.names=out1@nodes) models_data2 - data.frame(expr2, regime1, regime2, row.names=out2@nodes) m1 - hansen(models_data1[expr1], out1, models_data1[regime1], 1, 1) m2 - hansen(models_data2[expr2], out2, models_data2[regime1], 1, 1) unsuccessful convergence, code 10, see documentation for `optim' Warning message: In hansen(models_data2[expr2], out2, models_data2[regime1], : unsuccessful convergence m1 call: hansen(data = models_data1[expr1], tree = out1, regimes = models_data1[regime1], sqrt.alpha = 1, sigma = 1) nodes ancestors times labels regime1 expr1 1 1 NA 0.0 1NA 2 2 1 0.5 2NA 3 3 2 1.0 C 3 3 4 4 2 1.0 D 4 5 5 5 1 1.0 E 510 alpha: [,1] [1,] 3.734194 sigma squared: [,1] [1,] 9.233622e-30 theta: $expr1 1 2 3 4 5 0.5049171 1.3917440 3.3191037 5.6847686 10.2324134 loglik deviance aic aic.c sic dof 100.5415 -201.0831 -187.0831 -209.4831 -193.39287. m22 - hansen(models_data2[expr2], out2, models_data2[regime2], 1, 1) m11 - hansen(models_data1[expr1], out1, models_data1[regime2], 1, 1) m11@loglik [1] -7.47607 m22@loglik [1] -7.47607 sessionInfo() R version 2.15.1 (2012-06-22) Platform: i486-pc-linux-gnu (32-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] ape_3.0-5 ouch_2.8-2subplex_1.1-3 loaded via a namespace (and not attached): [1] gee_4.13-18 grid_2.15.1 lattice_0.20-10 nlme_3.1-104 tools_2.15.1 -- www.biocut.unito.it ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Marguerite A. Butler Associate Professor Department of Biology University of Hawaii 2450 Campus Rd., Dean
Re: [R-sig-phylo] Normality requirement for assessment of lambda with phylosig (phytools) and fitContinuous (geiger)
Hi Nina and everyone, One thing to consider is that not all zero data are the same. Zeros under a model of continuous trait evolution with a gaussian process as assumed under Brownian motion and OU processes would occasionally cross zero, maybe go negative, etc. For example if you were modeling something like the deviation from average height. You may have a lot of individuals that are at zero because they are all average, but their offspring will quickly move off zero as some will be taller, some shorter than average. On the other hand, many of us measure traits which disappear, for example scale counts on fish. Numbers of scale rows vary while there are scales, but once they disappear, those lineages will be at zero, perhaps for a very long time. In this case it is no longer behaving as a gaussian process with small changes expected every time period. We usually think of these more as a threshold trait. Maybe there is some hormone or something else (a hidden variable) underlying the determination of scales or no scales, and once it goes below threshhold the scales disappear. The hidden variable may fit model assumptions, but not the scale counts (what we can see and measure). For example, the scale counts can never go negative. With a value like height, it also never goes negative, but usually we are far away from that zero boundary so we can casually ignore that problem:). Or we can do a log-transform, or something else that transforms the data onto a different scale. Anyway, the a! bsorbing boundary zeros are, I think, an example of what Ted is talking about. So it depends on the nature of your data. On the other hand, for the other variable, if there is just not much variation, but it's not stuck on any particular value (doesn't appear to have any absorbing boundary), I think that's less of a problem. HTH, Marguerite On Apr 25, 2012, at 7:17 AM, Theodore Garland Jr wrote: Read over the Blomberg et al. (2003) paper. K is intended for continuous-valued traits and/or those evolving similar to Brownian motion. You could report it if you wished, but I would add that caveat if you do. The randomization test should be robust in any case. Cheers, Ted From: Nina Hobbhahn [n.hobbh...@gmail.com] Sent: Wednesday, April 25, 2012 9:19 AM To: Theodore Garland Jr Cc: Alejandro Gonzalez; Hunt, Gene; Enrico Rezende; r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] Normality requirement for assessment of lambda with phylosig (phytools) and fitContinuous (geiger) Thanks all for your helpful contributions! I will use phylosignal. Ted, I'm not sure I understand your last comment, when the data are not though of as continuous-valued and/or evolving similar to Brownian motion. What do you mean by that? Also, are you suggesting that I report the presence/absence of phylogenetic signal, but not the value of the K statistic? Many thanks again, Nina On 2012-04-25, at 5:54 PM, Theodore Garland Jr wrote: However, calculating a K statistic is strange when the data are not thought of as continuous-valued and/or evolving similar to Brownian motion. The randomization test is OK, however. Cheers, Ted From: Alejandro Gonzalez [alejandro.gonza...@ebd.csic.es] Sent: Wednesday, April 25, 2012 8:46 AM To: Theodore Garland Jr Cc: Nina Hobbhahn; r-sig-phylo@r-project.org Subject: Re: [R-sig-phylo] Normality requirement for assessment of lambda with phylosig (phytools) and fitContinuous (geiger) Hello, The library picante in R implements Blomberg et al (2003) K estimate, Liam's phytools package does as well. Phytools has the added advantage, if I remember correctly, of allowing users to estimate K including within species variation. Cheers Alejandro On 25, Apr 2012, at 5:29 PM, Theodore Garland Jr wrote: I would suggest the randomization test in Blomberg et al. (2003). This will give a valid significance test of the null hypothesis of no phylogenetic signal. By itself, it does not give a measure of the strength (or amount) of phylogenetic signal. Not sure if it is implented in r. If not, I can send our Matlab code. Cheers, Ted Theodore Garland, Jr. Professor Department of Biology University of California, Riverside Riverside, CA 92521 Office Phone: (951) 827-3524 Home Phone: (951) 328-0820 Facsimile: (951) 827-4286 = Dept. office (not confidential) Email: tgarl...@ucr.edu http://www.biology.ucr.edu/people/faculty/Garland.html Experimental Evolution: Concepts, Methods, and Applications of Selection Experiments. 2009. Edited by Theodore Garland, Jr. and Michael R. Rose http://www.ucpress.edu/book.php?isbn=9780520261808 (PDFs of chapters are available from me or from the individual authors) From:
Re: [R-sig-phylo] multiple traits measured within species
Dear Kaspar, Are your traits the same? Basically, are the seven color patches on the same individual all one trait, or are they seven traits? Do all species have the seven color patches? Or are the number of color patches variable? (this could also be a character). I think before you attempt a comparative analysis, it is helpful to think about how you would set up a standard statistical analysis, for example some sort of ANOVA. I am not sure I understand your description clearly. It sort of sounds like you are considering the multiple color patches as repeated measures of color patch on the same individual, but then you want to include them all as separate traits in the analysis. You could consider them as repeated measurements of a single color patch trait, in which case you would include the individual and species as factors in your analysis. Or you could consider each patch a different trait, and then compare them across species, so that each color patch is a separate dependent variable (in which case color patch 1 would be the same trait across individuals and species, etc.). Or you could take the mean of all color patches within an individual and use that in an analysis across individuals and species. Do you have any clues from development as to their identity? It seems like you want to consider them all to be replicates of the same measure. But then why measure so many? I assume there is variation among color patches within an individual? Ultimately you will have to decide if 3 color patches is one trait or if it is three traits. After you decide this, it will be much easier to design a comparative analysis. Good luck, Marguerite On Mar 6, 2012, at 2:42 PM, Kaspar Delhey wrote: Hello, I was wondering whether I could get some input into the following analysis: I am interested to test for the association between one dependent variable (continuous, normal) and one factor and a covariate while accounting for phylogenetic relatedness. In my analysis I have 12 species and I have measured the dependent variable and the covariate (which are both colour descriptors) on several patches per species (between 1 and 7 patches per species). The factor (the type of visual sensitivity) does not vary within species. Thus the repeated observations per species do not correspond to different individuals but are different patches of colour in the same individual, measured on the same scale and units (hence directly comparable). Does it make sense to make use of the whole dataset (i.e. without computing species averages) by replacing each tip in the phylogeny by a polytomy including all patches measured in that particular species? For example if I have this tree: (A, (B, C)) and I have measured two patches of colour in sp. A and three in spp B and C I could replace the tips in this tree by: ((A1,A2),((B1,B2,B3),(C1,C2,C3))) In this way I could use a gls approach with nlme and for example corPagel such as: model1-gls(dep.var~factor+covariate, correlation=Pagel, method=REML, data=data) The results that I obtain from such a model make sense and qualitatively agree with the results from a mixed model including species ID as a random factor but ignoring phylogenetic relatedness between species. My question then is whether there is a flaw in this analysis and whether it has been used before in other publications in order to be able to include references to back it up. Thanks in advance for any help. best kaspar -- Kaspar Delhey e-mail: kaspar.del...@monash.edu https://sites.google.com/site/kaspardelhey/ Tel:+61-(0)3-99020377 Bldg.18 School of Biological Sciences Monash University Clayton, 3800 Victoria Australia ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] How to detect phylogenetic signal (lambda) in one unscaled trait?
Hi Folks, After that very long-winded answer I still managed to leave out something. I want to reinforce what Ted and Joe said regarding absolute vs. log scales. Most comparative methods machinery as currently implemented work with absolute scales (the random component of the evolutionary change is modeled as normal and identically distributed through time, and adds up to some final amount of change and is therefore additive). So if you don't log-transform your data, then you are allowing your mice to evolve the same absolute value in size as you are your elephants. Marguerite On Apr 1, 2011, at 7:57 AM, Marguerite Butler wrote: Hi Alberto and Ted, and others, Maybe this is too late to jump into the discussion, but I wanted to add a few comments regarding analysis of the evolution of size and shape. I agree with Ted that there are a number of important considerations: (1) size and its meaning, (2) how you remove it, and (3) the model of size and size evolution. And it is really important to think about the biology as well as the statistical considerations. Too often the biology gets lost. First is what is the meaning of size, and why are you trying to remove it? What is the underlying biological relationship? For example, if you are interested in relative growth, then probably size is best captured by mass. Alternatively, if you are interested in limbed locomotion, then having long legs relative to your body length is more important, and so your size is equal to body length. Or if you are interested in an anti-predator behavior, then being bigger is important because you look scarier. In that case, size might mean mass if you animal gets larger more or less similarly in all dimensions, or it might be relative to a characteristic length (say jaw width or a head size dimension) if your animal's threat display is in a particular orientation. What you choose as your size variable is critical for interpreting what you have left over after adjusting for it (usually some aspect of shape). Also, you should always check to make sure that your shape variables make sense. After you removed size, just look at your shape data. Do your shape variables look ridiculous? If so, then you should reconsider before you do your analysis. Make sure that your shape data actually captures (hopefully accentuates) the biological relationship you're trying to look at, and not obscure it. Also remember that size is actually a hugely important variable(!), important for nearly all aspects of an animal's ecology and physiology. Removing size -- There is a huge literature on size and shape analysis (allometry), especially in the quantitative genetics and morphometrics literature, which seems to get bypassed in the comparative phylogenetics literature, which is a shame. The two papers that I have found most helpful are: Klingenberg, C. P. 1996. Multivariate allometry. Pages 23â 49 in L. F. Marcus, M. Corti, A. Loy, G. Naylor, and D. E. Slice, editors. Advances in morphometrics. Plenum Press, New York, New York, USA. Mosimann, J. 1970. Size allometry: size and shape variables with characterizations of the lognormal and generalized gamma distributions. Journal of the American Statistical Association 65:930â945. Most phylogenetic comparative studies use species-regressions, as Ted mentioned, to remove size. This is a statistically valid approach, but there are other considerations as well. One undesirable consequence is that this method is extremely sensitive to your sample of species. If you remove some of your species or add new ones, then your regression line will change, which means that your size variable (and consequently all of your shapes) are now different. Your shape variables will definitely not be comparable between studies. So the question is -- is it OK that your idea of big and small can change depending on which datapoints are included? An alternative method is to use a size index which is an inherent property of the individual, and use the size index to adjust each individual's measurements to create the shape variables. This is explained in the Mosimann 1970 paper. I typically use a geometric-mean size index, and take the log-ratio of each variable to size: shape1 = log( var1 / sizevar) = log(var1) - log(sizevar). People most often take log-transformed data when working with morphometric data because most often these data follow power law relationships with size (this breaks up multiplicative relationships into additive (linear) ones). Whether or not the original variables scale in a linear isometric relationship with size is not really an issue, the logs take care of linearizing power relationships, and you can always look at the relationships that result to interpret the scaling factors. If you have data on individuals, then the advantage of this method also
Re: [R-sig-phylo] multi-state categorical predictor variables in PGLS
Hi Andrew, Does this sidestep the degrees of freedom problem discussed by Garland et al.? Can anybody point me to references discussing the mechanics of this process and why this is an appropriate thing to do? Others on this list will disagree with me, but it's not a degrees of freedom problem. What phylogeny introduces is not a reduction in the number of degrees of freedom, but rather covariance in the traits at the end of the evolutionary process. The number of species is the number of species. Period. But they are sometimes similar to different degrees because of shared history. Statisticians call this problem correlated errors, in this case, the phylogeny is the error:) that is obscuring the true relationship between ecology and your trait of interest, for example. It is simple to remove the correlated error by accounting for this shared covariance structure, which is what the PGLS code does. It is basically like normalizing variances in univariate statistics by dividing by the variance. In this case the variance is a multivariate covariance matrix, calculated from the phylogeny (expected similarity based on shared ancestry). The dummy code is just a way to do include a categorical (multistate) variable in the ANOVA, and is a standard parametric statistic procedure. We describe how to do effect coding by hand in an old paper (see appendix). Doing it by hand helps to understand what is going on: Butler M.A. Schoener T.W., and Losos J.B. (2000) The relationship between habitat type and sexual size dimorphism in Greater Antillean Anolis lizards. Evolution 54(1):259-272. Anyway, yes, this should take care of the phylogenetic analysis. (even though I wouldn't call it a degrees of freedom problem). Marguerite On Mar 7, 2011, at 7:06 AM, Alejandro Gonzalez V wrote: Hi Andrew, As I understand it, gls() is doing a multiple generalized LS regression with as many dummy variables as there are factor levels. Is this a correct characterization? I think you'd get one dummy variable less than factor levels in your characterization (at least in regards to the number of levels for which parameters are estimated), as the gls sets one of the levels as the point of comparison with all other levels. Thus you'd get n-1 dummy variables for which the parameters are estimated. Having such a low value of alpha the results of the phylogenetic gls should be similar (if not identical) to results not taking phylogeny into account, as this suggests you don't have phylogenetic signal in the residuals of your relationship. There is a good paper on this issue by Liam Revell in Methods in Ecology and Evolution. There is also a function in geiger which allows you to run phylogenetic ANOVAs, but if I am not mistaken, the p-value is estimated based on simulations assuming traits evolve via Brownian motion (is this correct?). I've also seen lambda values below 0 in ape, theoretically lambda is described as being bounded between 0 and 1, but it could take values outside the bounds. I would be interested in hearing the thoughts of others in the list regarding whether lambda values for the Phylogenetic gls should be forced to be bounded between 0 and 1. This would more closely follow what has been proposed in the literature wouldn't it? Cheers Alejandro __ Alejandro Gonzalez Voyer Post-doc Estación Biológica de Doñana (CSIC) Avenida Américo Vespucio s/n 41092 Sevilla Spain Tel: +34- 954 466700, ext 1749 E-mail: alejandro.gonza...@ebd.csic.es Web-site: https://docs.google.com/View?id=dfs328dh_14gwwqsxcg On 7, Mar 2011, at 5:52 PM, Andrew Barr wrote: Hi everyone, I am trying to piece together the current best-practices for phylogenetic ANOVA with multi-state predictors. In my dataset, my four-level factor is non-random with respect to phylogeny. That is, if I know which higher level clade an species belongs to, I can predict with pretty good success which factor level it will be in. My understanding is that this situation likely overinflates my degrees of freedom and makes traditional F-tests inappropriate. I came across this paper (Garland et al 1993. Phylogenetic Analysis of Covariance by Computer Simulation. Systematic Biology 42:265 -292.) where the authors empirically recalculate critical values for F-ratios using computer simulations, tree topology, and a model of character evolution. I also have found that I can use PGLS (with ape and nlme) and specify my model like this. gls(myVar~myFactor,corr=corPagel(val=1,phy=myTree,fixed=F),data=myDF) As I understand it, gls() is doing a multiple generalized LS regression with as many dummy variables as there are factor levels. Is this a correct characterization? Does this sidestep the degrees of freedom problem discussed by Garland et al.? Can anybody point me to references discussing the
Re: [R-sig-phylo] Dealing with Bounded Trait Measures
Hi David, I am not sure what you mean. If you add a tiny value to all of your data, then all you've done is shifted the mean ever so slightly. You then log-transform to put the variables on a scale unbounded by 0, and then you measure phylogenetic signal. The BM process models evolutionary changes as random draws from a normal distribution. So this would be the log-transformed values here. Adding a tiny value to all data doesn't change things very much. Log transformation changes things a lot more. So you're modeling a BM process on the log-transformed values. But there is nothing illegal about that. You do have to think about it, however. This makes a lot of sense to me when I'm modeling body size evolution, because the model would imply that when you're small, evolutionary changes tend to be smaller than when you're big (evolutionary changes on larger animals would tend to have bigger jumps on the original scale). Ignoring any clade-specific biases, evolutionary changes in the body size of mice would tend to be a lot smaller than evolutionary changes in the size of horses, for example. We can imagine that some simple mechanistic reasons such as their subunits are smaller (they have smaller cells, smaller vertebrae, etc., so adding on or taking away would have bigger size effects). This seems biologically reasonable. I am not sure if it would apply to your spine size data, but maybe it does. Of course if your size variation is not that great, then it doesn't much matter (because log-trasformation will then not have much effect on compressing the scale of the values). Anyway, this may be a longer-winded comment than you were asking for. I can't think of any reason why adding tiny values would do anything to the evolutionary simulation. Marguerite On Mar 6, 2011, at 9:53 AM, David Bapst wrote: Marguerite and others- I had missed that Scales et al paper. Thank for pointing it out. I have also had to deal with variables that were bounded by 0 and 1 previously, and the use of the logit is something I'll have to try when I go back to those analyses. Although I am interested in transformations of such traits for comparative analyses in general (measuring correlated trait evolution, model fitting of OU, measurements of rate, etc.), I am mainly interested in measurements of phylogenetic signal. Is there any reason to think the kludge (adding an arbitrarily small value to trait values) would bias a test for phylogenetic signal (i.e. do the trait show a pattern of inheritance)? I'm worried that measurements of signal might be prone to odd results with the kludge, because the data structure would still be quite different from BM expectation. Perhaps, the threshold model might be of a lot of use here; I'm going to have to look into that much more. Thanks for you advice, all! -Dave On Sun, Mar 6, 2011 at 12:50 AM, Marguerite Butler mbut...@hawaii.edu wrote: Hi David, Liam and everyone, Reflecting traits at boundaries or absorbing them is something that can be done, but I guess I'd like to encourage everyone to think carefully about the interpretation of such simulations. What are you trying to model and what does it mean at the end? Doing these boundary manipulations produces odd shaped-distributions of traits at the end of the simulated process. But probably more importantly, does this a good model for the biological process which might be occurring? If so, then great. But I would guess not with traits such as spine length. The zero issue is a hard problem. Is it a combination of a discrete trait (presence/absence) and a continuous trait (length)? Or perhaps a better model would be a threshold trait with underlying continuous variation but below some boundary it will not be expressed? Has anyone developed a model that can be used for such a scenario? In the absence of such a model, I've done things like what Enrico suggests, with a kludge solution like adding a very small value to all values to avoid the zero. This doesn't make much difference to the analysis, and avoids the singularity at log(0). If you have a trait that is a frequency, you can use the logit() function instead of log(), as we did in Scales et al 2009. Scales J.A., King A.A., and Butler M.A. (2009) Running for your life or running for your dinner: What drives fiber-type evolution in lizard locomotor muscles? Am. Nat.173:543-553. Marguerite On Mar 5, 2011, at 11:06 AM, Liam J. Revell wrote: I'm not sure that this is what Dave has in mind, but if anyone is interested in simulating bounded evolution in R, I just added it to my fastBM() function (code here: http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/fastBM/v0.3/fastBM.R). In the process of evolving traits up the tree, I just bounce back any phenotypes that exceed the lower or upper boundary conditions specified by the user (by default they are -Inf and Inf). I think I
Re: [R-sig-phylo] Dealing with Bounded Trait Measures
Hi David, Liam and everyone, Reflecting traits at boundaries or absorbing them is something that can be done, but I guess I'd like to encourage everyone to think carefully about the interpretation of such simulations. What are you trying to model and what does it mean at the end? Doing these boundary manipulations produces odd shaped-distributions of traits at the end of the simulated process. But probably more importantly, does this a good model for the biological process which might be occurring? If so, then great. But I would guess not with traits such as spine length. The zero issue is a hard problem. Is it a combination of a discrete trait (presence/absence) and a continuous trait (length)? Or perhaps a better model would be a threshold trait with underlying continuous variation but below some boundary it will not be expressed? Has anyone developed a model that can be used for such a scenario? In the absence of such a model, I've done things like what Enrico suggests, with a kludge solution like adding a very small value to all values to avoid the zero. This doesn't make much difference to the analysis, and avoids the singularity at log(0). If you have a trait that is a frequency, you can use the logit() function instead of log(), as we did in Scales et al 2009. Scales J.A., King A.A., and Butler M.A. (2009) Running for your life or running for your dinner: What drives fiber-type evolution in lizard locomotor muscles? Am. Nat.173:543-553. Marguerite On Mar 5, 2011, at 11:06 AM, Liam J. Revell wrote: I'm not sure that this is what Dave has in mind, but if anyone is interested in simulating bounded evolution in R, I just added it to my fastBM() function (code here: http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/fastBM/v0.3/fastBM.R). In the process of evolving traits up the tree, I just bounce back any phenotypes that exceed the lower or upper boundary conditions specified by the user (by default they are -Inf and Inf). I think I did this properly. Feedback welcome though. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ (new) email: liam.rev...@umb.edu (new) blog: http://phytools.blogspot.com On 3/5/2011 12:55 PM, tgarl...@ucr.edu wrote: Hello David, Enrico, et al., I may have lost track of what Dave was originally trying to do, and I am not familiar with all of the options presently available in r for simulating continuously valued traits along a specified phylogenetic tree. However, I wanted to point out that MANY possibilities, including trends, the OU process, and actual limits to trait evolution implemented in several ways, are available in our original DOS program PDSIMUL.EXE that accompanies this paper: Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. Phylogenetic analysis of covariance by computer simulation. Systematic Biology 42:265-292. It has been used many times to look at trends, limits, etc., e.g., in these papers: Díaz-Uriarte, R., and T. Garland, Jr. 1996. Testing hypotheses of correlated evolution using phylogenetically independent contrasts: sensitivity to deviations from Brownian motion. Systematic Biology 45:27-47. Laurin, M. 2010. Assessment of the relative merits of a few methods to detect evolutionary trends. Syst. Biol. 59:689-704. Cheers, Ted Theodore Garland, Jr. Professor Department of Biology University of California, Riverside Riverside, CA 92521 Office Phone: (951) 827-3524 Lab Phone: (951) 827-5724 Home Phone: (951) 328-0820 Facsimile: (951) 827-4286 = Dept. office (not confidential) Email: tgarl...@ucr.edu Main Departmental page: http://www.biology.ucr.edu/people/faculty/Garland.html List of all Publications: http://www.biology.ucr.edu/people/faculty/Garland/GarlandPublications.html Garland and Rose, 2009 http://www.ucpress.edu/books/pages/10604.php Original message Date: Sat, 05 Mar 2011 15:36:13 +0100 From: Enrico Rezendeenrico.reze...@uab.cat Subject: Re: [R-sig-phylo] Dealing with Bounded Trait Measures To: David Bapstdwba...@uchicago.edu Cc: R Sig Phylo Listservr-sig-phylo@r-project.org David, on the top of my head, if no species measurement strictly corresponds to zero, you may log-transform the data. You may then simulate Brownian motion in log-transformed values, which will correspond to a boundary of zero in a linear scale (i.e., the more negative the log number, the closer the trait value is to zero - but never zero - in a linear scale). This also explains why you can simulate the evolution of body mass employing Brownian motion in log-transformed units and no species will ever be assigned a body mass of zero. On more speculative grounds, this may simply reflect the fact that many biological processes and their
Re: [R-sig-phylo] Dealing with Bounded Trait Measures
p.s. an absorbing boundary might be a good model for something like the disappearance of a gene or something. When it goes to such a low frequency it goes extinct. Then in order to reevolve, it needs to mutate (basically re-originate, which is a very rare event). But for reversible traits, absorbing boundaries don't seem like a good model to me. I have never been able to think up a good biological mechanism for a reflecting boundary. Sorry for the extra email from the incomplete thought. M On Mar 5, 2011, at 8:50 PM, Marguerite Butler wrote: Hi David, Liam and everyone, Reflecting traits at boundaries or absorbing them is something that can be done, but I guess I'd like to encourage everyone to think carefully about the interpretation of such simulations. What are you trying to model and what does it mean at the end? Doing these boundary manipulations produces odd shaped-distributions of traits at the end of the simulated process. But probably more importantly, does this a good model for the biological process which might be occurring? If so, then great. But I would guess not with traits such as spine length. The zero issue is a hard problem. Is it a combination of a discrete trait (presence/absence) and a continuous trait (length)? Or perhaps a better model would be a threshold trait with underlying continuous variation but below some boundary it will not be expressed? Has anyone developed a model that can be used for such a scenario? In the absence of such a model, I've done things like what Enrico suggests, with a kludge solution like adding a very small value to all values to avoid the zero. This doesn't make much difference to the analysis, and avoids the singularity at log(0). If you have a trait that is a frequency, you can use the logit() function instead of log(), as we did in Scales et al 2009. Scales J.A., King A.A., and Butler M.A. (2009) Running for your life or running for your dinner: What drives fiber-type evolution in lizard locomotor muscles? Am. Nat.173:543-553. Marguerite On Mar 5, 2011, at 11:06 AM, Liam J. Revell wrote: I'm not sure that this is what Dave has in mind, but if anyone is interested in simulating bounded evolution in R, I just added it to my fastBM() function (code here: http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/fastBM/v0.3/fastBM.R). In the process of evolving traits up the tree, I just bounce back any phenotypes that exceed the lower or upper boundary conditions specified by the user (by default they are -Inf and Inf). I think I did this properly. Feedback welcome though. - Liam -- Liam J. Revell University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ (new) email: liam.rev...@umb.edu (new) blog: http://phytools.blogspot.com On 3/5/2011 12:55 PM, tgarl...@ucr.edu wrote: Hello David, Enrico, et al., I may have lost track of what Dave was originally trying to do, and I am not familiar with all of the options presently available in r for simulating continuously valued traits along a specified phylogenetic tree. However, I wanted to point out that MANY possibilities, including trends, the OU process, and actual limits to trait evolution implemented in several ways, are available in our original DOS program PDSIMUL.EXE that accompanies this paper: Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. Phylogenetic analysis of covariance by computer simulation. Systematic Biology 42:265-292. It has been used many times to look at trends, limits, etc., e.g., in these papers: Díaz-Uriarte, R., and T. Garland, Jr. 1996. Testing hypotheses of correlated evolution using phylogenetically independent contrasts: sensitivity to deviations from Brownian motion. Systematic Biology 45:27-47. Laurin, M. 2010. Assessment of the relative merits of a few methods to detect evolutionary trends. Syst. Biol. 59:689-704. Cheers, Ted Theodore Garland, Jr. Professor Department of Biology University of California, Riverside Riverside, CA 92521 Office Phone: (951) 827-3524 Lab Phone: (951) 827-5724 Home Phone: (951) 328-0820 Facsimile: (951) 827-4286 = Dept. office (not confidential) Email: tgarl...@ucr.edu Main Departmental page: http://www.biology.ucr.edu/people/faculty/Garland.html List of all Publications: http://www.biology.ucr.edu/people/faculty/Garland/GarlandPublications.html Garland and Rose, 2009 http://www.ucpress.edu/books/pages/10604.php Original message Date: Sat, 05 Mar 2011 15:36:13 +0100 From: Enrico Rezendeenrico.reze...@uab.cat Subject: Re: [R-sig-phylo] Dealing with Bounded Trait Measures To: David Bapstdwba...@uchicago.edu Cc: R Sig Phylo Listservr-sig-phylo@r-project.org David, on the top of my head, if no species measurement strictly corresponds to zero, you may log-transform the data. You may then simulate
Re: [R-sig-phylo] Identifying nodes in ouch tree
Hi Alejandro, The info you want is given in the example script of the dataset bimac, which is included for ouch. To see the help page: require(ouch) data(bimac) ?bimac Then what you need is: tree - with(bimac,ouchtree(node,ancestor,time/max(time),species)) plot(tree,node.names=TRUE) The option you need is the node.names=TRUE option. Also, it is (sparsely) documented in the ouchtree help page, under the plot method: ?ouchtree Marguerite On Jan 24, 2011, at 7:30 AM, Alejandro Gonzalez wrote: Hello, I want to paint regimes on an ouch tree based on a specific dichotomous trait. I've identified the nodes and branches on the phylogeny where the selection regimes would change based on an ancestral state reconstruction of the character of interest. I wanted to know if there is a way of showing the node numbers of a plotted image of the ouchtree object as can be done in ape for a phylo object using nodelabels(). I ask this because when I transform my phylo object to an ouch object the node numbers no longer correspond. How can one identify which node numbers correspond to specific nodes on the phylogeny? Or do I simply have to make it out from the information available using print(ouchtree object)? Thanks, Alejandro __ Alejandro Gonzalez Voyer Post-doc Estación Biológica de Doñana Consejo Superior de Investigaciones Científicas (CSIC) Av Américo Vespucio s/n 41092 Sevilla Spain Tel: + 34 - 954 466700, ext 1749 E-mail: alejandro.gonza...@ebd.csic.es Web page: https://docs.google.com/View?id=dfs328dh_14gwwqsxcg [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Marguerite A. Butler Associate Professor Department of Zoology University of Hawaii 2538 McCarthy Mall, Edmondson 259 Honolulu, HI 96822 FAX: 808-956-9812 Dept: 808-956-8617 http://www2.hawaii.edu/~mbutler http://www.hawaii.edu/zoology/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] ancestral trait reconstruction and OU model
Dear Fabio, Andrew is correct in his answer regarding the new behavior of OUCH (thanks Andrew). In the new version of OUCH, the hansen fit has been modified so that the ancestral state at the root is no longer estimated, but it is rather assumed to be in one of the regimes in the dataset. In most of our work, the ancestral state at the root was really a nuisance parameter, and in some cases was returning really odd parameter estimates. This was probably because there is very little data to inform the root state, and so fitting a somewhat extraneous parameter seemed to cause a non-identifiability problem. This is probably because there is more than one way to fit (or to explain the data). For example, one could get a good fit by having the ancestral state close to optimum 1 with weak selection, or one could have the ancestral state very far away from optimum 1 with strong selection. Both solutions could result in reasonable fit. If the ancestor is free to vary, it is possible to obtain similar fits for different combinations of parameters. Since most comparative biologists don't have fossil data or any internal data to peg the root down, and since it is reasonable in many situations to assume that the ancestor was in one of the current adaptive regimes (usually you're only analyzing the ingroup), we thought it made the best sense to just include the root as being in one of the selective regimes. If one is unsure which adaptive regime the ancestor should be in, you could test all possible models with the ancestor in each of the selective regimes, to see which provides the best fit. However, if one has fossil data, then there is much better information to anchor the root. Also, you may have good reasons for estimating the root. In this case, do as Andrew says and assign a unique adaptive regime to the root. I believe this will provide the same fit as in the original version of OUCH. In general, we haven't found that estimating the ancestor separately affects model selection at all (i.e., you obtain the same best model no matter whether you estimate the ancestor or not). It seems to affect alpha and sigma only a little, but it can move the other optima around a bit (if you push the ancestor back, it tends to exaggerate the other states as well so that you end up at the proper target -- the data which you are trying to fit). Hope this is not too confusing. Take care, Marguerite Butler On Oct 26, 2009, at 8:53 AM, Andrew Hipp wrote: Dear Fabio, In ouch v2 and up, the root state is not treated as a free parameter and is not returned. One way to think about your question might be to assume that the branches at the base of the tree have their own optimum, and then treat that optimum as the root state. But in doing so, you should consider whether there are likely to have been transitions in selective regime among those branches that might make such an assumption unreasonable, and whether the estimate of alpha under the models you look at suggest that the optimum in those basal branches was reached quickly. To look at the former, you might look at whether alpha and sigma^2 estimates increase markedly when you model the basal branches separately as a way of estimating the ancestral state; if so, I would have doubts about the validity of modeling the root state of your characters that way. Also, some topologies aren't amenable to this treatment (unless you model just basal portions of the relevant branches as being in the root optimum, an option which is not available in ouch but valid nonetheless). To look at the latter, see Hansen (1997) re: the relationship between alpha and the rate at which a character evolves toward its optimum. If you want to look at the ancestral state using the old versions of ouch, you could try one of the 1.x versions of ouch from the ouch archive on cran (http://cran.r-project.org/src/contrib/Archive/ouch/). Be aware, though, that ouch 2.x is written in S4 and differs substantially from ouch 1.x, but if you have your trees and regimes set up already for ouch 2.x, you should be fine with the old versions. Andrew Hipp Andrew L. Hipp, PhD Plant Systematist and Herbarium Curator The Morton Arboretum 4100 Illinois Route 53 Lisle, IL 60532-1293 ah...@mortonarb.org (630) 725-2094 http://redwood.mortonarb.org/lab_pages/hipp Fabio de Andrade Machado f.mach...@usp.br 10/26/2009 3:12 AM Hi Graham, thanks for the reply. I don't know if this a particular problem for my kind of data (various characters), but I'm not receiving any Theta zero on the output. Just thetas for each selective regime. Is this a version problem? I'm with ouch v. 2.5-7 I do realize that in complex regimes the contribution of the terminal taxa to the root state is negligible, but as I'm dealing with morphometric data I have the possibility to cross-check the result with fossil data