[R-sig-phylo] Simulating BM data on phylogeny
Hi all, I'm observing a curious pattern in continuous trait data simulated on a phylogeny that I cannot explain. I thought I'd throw it out to the group for ideas. My apologies if this was addressed in an earlier thread. I'm exploring four different R functions for simulating continuous data under Brownian Motion: 'rTraitCont' in Ape, 'sim.char' in Geiger, 'transformPhylo.sim' from MotMot, and 'fastBM' from Liam Revell's PhyTools-beta. For these simulations I generate a tree (in this case a perfectly-balanced tree) and simulate 100 data sets on the same phylogeny using a particular initial BM rate parameter (sigma). For each simulation, I calculate the mean variance of the simulated tips data, and then summarize these across 100 simulations to understand some general properties of the 4 data-generating functions. I'm using a wide range of initial sigmas to see how things fall out (sigma = 0.0001, 0.001. 0.01 ... 1000). At sigma = 1.0, the mean of the variance in tips data across simulations is similar for all 4 methods. And as expected, as sigma increases or decreases, the mean variation among tips values also increases or decreases. For each sigma value, mean levels of variation are also similar for 3 of the 4 methods: with rTraitCont having much larger relative levels of variation as sigma 1 , and much smaller relative levels of variation as sigma 1. (I suspect this might be due to a scaling difference in the functions: sim.char, fastBM, and transformPhylo.sim all use ~sqrt(sigma*branchlengths) when generating random normal data, but I'm not certain what rTraitCont uses.) However, the most curious finding is that for all methods, as sigma increases, so too does the mean trait value across the tips (and the converse occurs as sigma decreases). This observation is curious to me, as one should not see a predictable shift in the mean under Brownian motion. I thought this might be due to simulating too few data sets and taking the mean of the mean, but the pattern remained when 10,000 simulated data sets were generated (though obviously, the effect was smaller, with mean values closer to zero). I then repeated the simulations for different numbers of taxa (N=16, 32, 64, 128) and the pattern was still present. So I think this observation is robust. Presently, I am at a loss to explain this observation, though I have a nagging suspicion that I'm missing something obvious. For instance, it may be related to the Central limit theorem, as increasing the number of simulated data sets decreased the magnitude of the mean effect across them. But that doesn't fully explain why the mean deviation from zero increases systematically with an increasing rate parameter (sigma). Any thoughts on this would be greatly appreciated. Best, Dean -- Dr. Dean C. Adams Associate Professor Department of Ecology, Evolution, and Organismal Biology Department of Statistics Iowa State University Ames, Iowa 50011 www.public.iastate.edu/~dcadams/ phone: 515-294-3834 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Simulating BM data on phylogeny
I just tried this using the code below, and I can't see a trend. I calculated the mean tip value across the species in each simulation, and made a box plot over 1 simulations. Although sim.char() doesn't do this, one can simply draw these numbers from a multivariate normal distribution - and so their expected mean is zero. rr-c(0.001, 0.01, 0.1, 1, 10, 100, 1000) phy-birthdeath.tree(b=1, d=0, taxa.stop=20) allMeans-matrix(nrow=1, ncol=length(rr)) for(i in 1:length(rr)) { mm-rr[i] ss-sim.char(phy, as.matrix(rr[i]), nsims=1) allMeans[,i]-apply(ss, 3, FUN=mean) } meanmeans-apply(allMeans, 2, FUN=mean) cc-as.numeric(allMeans) xx-rep(rr, each=1) plot(as.factor(xx), cc) On May 20, 2011, at 12:32 PM, Dean Adams wrote: Hi all, I'm observing a curious pattern in continuous trait data simulated on a phylogeny that I cannot explain. I thought I'd throw it out to the group for ideas. My apologies if this was addressed in an earlier thread. I'm exploring four different R functions for simulating continuous data under Brownian Motion: 'rTraitCont' in Ape, 'sim.char' in Geiger, 'transformPhylo.sim' from MotMot, and 'fastBM' from Liam Revell's PhyTools-beta. For these simulations I generate a tree (in this case a perfectly-balanced tree) and simulate 100 data sets on the same phylogeny using a particular initial BM rate parameter (sigma). For each simulation, I calculate the mean variance of the simulated tips data, and then summarize these across 100 simulations to understand some general properties of the 4 data-generating functions. I'm using a wide range of initial sigmas to see how things fall out (sigma = 0.0001, 0.001. 0.01 ... 1000). At sigma = 1.0, the mean of the variance in tips data across simulations is similar for all 4 methods. And as expected, as sigma increases or decreases, the mean variation among tips values also increases or decreases. For each sigma value, mean levels of variation are also similar for 3 of the 4 methods: with rTraitCont having much larger relative levels of variation as sigma 1 , and much smaller relative levels of variation as sigma 1. (I suspect this might be due to a scaling difference in the functions: sim.char, fastBM, and transformPhylo.sim all use ~sqrt(sigma*branchlengths) when generating random normal data, but I'm not certain what rTraitCont uses.) However, the most curious finding is that for all methods, as sigma increases, so too does the mean trait value across the tips (and the converse occurs as sigma decreases). This observation is curious to me, as one should not see a predictable shift in the mean under Brownian motion. I thought this might be due to simulating too few data sets and taking the mean of the mean, but the pattern remained when 10,000 simulated data sets were generated (though obviously, the effect was smaller, with mean values closer to zero). I then repeated the simulations for different numbers of taxa (N=16, 32, 64, 128) and the pattern was still present. So I think this observation is robust. Presently, I am at a loss to explain this observation, though I have a nagging suspicion that I'm missing something obvious. For instance, it may be related to the Central limit theorem, as increasing the number of simulated data sets decreased the magnitude of the mean effect across them. But that doesn't fully explain why the mean deviation from zero increases systematically with an increasing rate parameter (sigma). Any thoughts on this would be greatly appreciated. Best, Dean -- Dr. Dean C. Adams Associate Professor Department of Ecology, Evolution, and Organismal Biology Department of Statistics Iowa State University Ames, Iowa 50011 www.public.iastate.edu/~dcadams/ phone: 515-294-3834 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Luke Harmon Assistant Professor Biological Sciences University of Idaho 208-885-0346 lu...@uidaho.edu [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
[R-sig-phylo] Running fitDiscrete using try and if
Hi, I am trying to run simulations of evolution of discrete traits using fitDiscrete() in the package geiger. When the simulated data have very few changes in character state (or none) I get an error message: Error in fitDiscrete(phy, fcare[, i], model = ER, data.names = row.names(fcare), : ERROR: No solution found. Does your tree contain zero-length tip branches? The tree has no zero-length branches and it works fine with simulated data under somewhat higher rates of change using the same phylogeny. My problem is that I would like to use a loop to run several simulated data sets. But the loop stops whenever fitDiscrete() finds itself with one of the problematic datasets involving very low or no changes in character state and there seems to be no way of extracting the parameter values of previous runs of the loop. I was wondering it there was a way of creating a loop using try and if to avoid the loop crashing when the function tries to estimate rate of transition for data simulated under low rates of change. In such cases I would like it to simply skip to the next column of simulated data in the loop. Cheers Alejandro __ Alejandro Gonzalez Voyer Post-doc Estación Biológica de Doñana Consejo Superior de Investigaciones Científicas (CSIC) Av Américo Vespucio s/n 41092 Sevilla Spain Tel: + 34 - 954 466700, ext 1749 E-mail: alejandro.gonza...@ebd.csic.es Web page: https://docs.google.com/View?id=dfs328dh_14gwwqsxcg ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Simulating BM data on phylogeny
Hi Luke, The bit where I noticed it would be in the 'meanmeans' in your code. Deviation from zero is ~ 6e-5 for rate = 0.001, and increases to 1.1e-1 as rate goes to 1000. True it is slight, but this pattern mimics what I observed. meanmeans [1] -6.856330e-05 -7.238577e-05 2.534556e-03 -2.602753e-03 7.473694e-03 [6] -1.020982e-02 1.116674e-01 But it is quite true that the distribution of individual simulation means appears centered around zero for all cases (and relative to the distribution of individual outcomes, the 'meanmeans' trend is very small). Thanks, Dean -- Dr. Dean C. Adams Associate Professor Department of Ecology, Evolution, and Organismal Biology Department of Statistics Iowa State University Ames, Iowa 50011 www.public.iastate.edu/~dcadams/ phone: 515-294-3834 On 5/20/2011 3:44 PM, Luke Harmon wrote: I just tried this using the code below, and I can't see a trend. I calculated the mean tip value across the species in each simulation, and made a box plot over 1 simulations. Although sim.char() doesn't do this, one can simply draw these numbers from a multivariate normal distribution - and so their expected mean is zero. rr-c(0.001,0.01,0.1,1,10,100,1000) phy-birthdeath.tree(b=1,d=0,taxa.stop=20) allMeans-matrix(nrow=1,ncol=length(rr)) for(i in1:length(rr)){ mm-rr[i] ss-sim.char(phy,as.matrix(rr[i]),nsims=1) allMeans[,i]-apply(ss, 3, FUN=mean) } meanmeans-apply(allMeans, 2, FUN=mean) cc-as.numeric(allMeans) xx-rep(rr, each=1) plot(as.factor(xx),cc) On May 20, 2011, at 12:32 PM, Dean Adams wrote: Hi all, I'm observing a curious pattern in continuous trait data simulated on a phylogeny that I cannot explain. I thought I'd throw it out to the group for ideas. My apologies if this was addressed in an earlier thread. I'm exploring four different R functions for simulating continuous data under Brownian Motion: 'rTraitCont' in Ape, 'sim.char' in Geiger, 'transformPhylo.sim' from MotMot, and 'fastBM' from Liam Revell's PhyTools-beta. For these simulations I generate a tree (in this case a perfectly-balanced tree) and simulate 100 data sets on the same phylogeny using a particular initial BM rate parameter (sigma). For each simulation, I calculate the mean variance of the simulated tips data, and then summarize these across 100 simulations to understand some general properties of the 4 data-generating functions. I'm using a wide range of initial sigmas to see how things fall out (sigma = 0.0001, 0.001. 0.01 ... 1000). At sigma = 1.0, the mean of the variance in tips data across simulations is similar for all 4 methods. And as expected, as sigma increases or decreases, the mean variation among tips values also increases or decreases. For each sigma value, mean levels of variation are also similar for 3 of the 4 methods: with rTraitCont having much larger relative levels of variation as sigma 1 , and much smaller relative levels of variation as sigma 1. (I suspect this might be due to a scaling difference in the functions: sim.char, fastBM, and transformPhylo.sim all use ~sqrt(sigma*branchlengths) when generating random normal data, but I'm not certain what rTraitCont uses.) However, the most curious finding is that for all methods, as sigma increases, so too does the mean trait value across the tips (and the converse occurs as sigma decreases). This observation is curious to me, as one should not see a predictable shift in the mean under Brownian motion. I thought this might be due to simulating too few data sets and taking the mean of the mean, but the pattern remained when 10,000 simulated data sets were generated (though obviously, the effect was smaller, with mean values closer to zero). I then repeated the simulations for different numbers of taxa (N=16, 32, 64, 128) and the pattern was still present. So I think this observation is robust. Presently, I am at a loss to explain this observation, though I have a nagging suspicion that I'm missing something obvious. For instance, it may be related to the Central limit theorem, as increasing the number of simulated data sets decreased the magnitude of the mean effect across them. But that doesn't fully explain why the mean deviation from zero increases systematically with an increasing rate parameter (sigma). Any thoughts on this would be greatly appreciated. Best, Dean -- Dr. Dean C. Adams Associate Professor Department of Ecology, Evolution, and Organismal Biology Department of Statistics Iowa State University Ames, Iowa 50011 www.public.iastate.edu/~dcadams/ http://www.public.iastate.edu/%7Edcadams/ phone: 515-294-3834 ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Luke Harmon Assistant Professor Biological Sciences University of Idaho 208-885-0346
Re: [R-sig-phylo] Simulating BM data on phylogeny
Ted wrote -- Based on zillions of BM simulations we have done with DOS PDSIMUL, you should never see a shift in mean value (versus starting value at root of tree), unless you are intentionally modeling a trend. Something must be wrong. in response to Dean Adams: For these simulations I generate a tree (in this case a perfectly-balanced tree) and simulate 100 data sets on the same phylogeny using a particular initial BM rate parameter (sigma). ... However, the most curious finding is that for all methods, as sigma increases, so too does the mean trait value across the tips (and the converse occurs as sigma decreases). This observation is curious to me, as one should not see a predictable shift in the mean under Brownian motion. Is it possible that the simulation is doing Brownian Motion on some other scale, such as a log scale? If one does BM on the logs and then looks at the original phenotype scale, you *would* expect that as the variance among species increases, so does the mean of the species means. Joe Joe Felsenstein j...@gs.washington.edu Dept of Genome Sciences and Dept of Biology, Univ. of Washington, Box 5065, Seattle Wa 98195-5065 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Re: [R-sig-phylo] Running fitDiscrete using try and if
Yes, fitDiscrete may be having trouble with the bounds of the rate parameter's search space. If we denote the (possibly) offending function as: fdResult-fitDiscrete(xxx) Then you can do this: te-try(fdResult-fitDiscrete(xxx)) if(class(te)==try-error) { # do something if there is an error - maybe print a warning? } On May 20, 2011, at 2:07 PM, Alejandro Gonzalez wrote: Hi, I am trying to run simulations of evolution of discrete traits using fitDiscrete() in the package geiger. When the simulated data have very few changes in character state (or none) I get an error message: Error in fitDiscrete(phy, fcare[, i], model = ER, data.names = row.names(fcare), : ERROR: No solution found. Does your tree contain zero-length tip branches? The tree has no zero-length branches and it works fine with simulated data under somewhat higher rates of change using the same phylogeny. My problem is that I would like to use a loop to run several simulated data sets. But the loop stops whenever fitDiscrete() finds itself with one of the problematic datasets involving very low or no changes in character state and there seems to be no way of extracting the parameter values of previous runs of the loop. I was wondering it there was a way of creating a loop using try and if to avoid the loop crashing when the function tries to estimate rate of transition for data simulated under low rates of change. In such cases I would like it to simply skip to the next column of simulated data in the loop. Cheers Alejandro __ Alejandro Gonzalez Voyer Post-doc Estación Biológica de Doñana Consejo Superior de Investigaciones Científicas (CSIC) Av Américo Vespucio s/n 41092 Sevilla Spain Tel: + 34 - 954 466700, ext 1749 E-mail: alejandro.gonza...@ebd.csic.es Web page: https://docs.google.com/View?id=dfs328dh_14gwwqsxcg ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Luke Harmon Assistant Professor Biological Sciences University of Idaho 208-885-0346 lu...@uidaho.edu ___ R-sig-phylo mailing list R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo