[R-sig-phylo] Simulating BM data on phylogeny

2011-05-20 Thread Dean Adams

Hi all,

I'm observing a curious pattern in continuous trait data simulated on a 
phylogeny that I cannot explain. I thought I'd throw it out to the group 
for ideas. My apologies if this was addressed in an earlier thread.


I'm exploring four different R functions for simulating continuous data 
under Brownian Motion: 'rTraitCont' in Ape, 'sim.char' in Geiger, 
'transformPhylo.sim' from MotMot, and 'fastBM' from Liam Revell's 
PhyTools-beta.  For these simulations I generate a tree (in this case a 
perfectly-balanced tree) and simulate 100 data sets on the same 
phylogeny using a particular initial BM rate parameter (sigma).  For 
each simulation, I calculate the mean  variance of the simulated tips 
data, and then summarize these across 100 simulations to understand some 
general properties of the 4 data-generating functions.  I'm using a wide 
range of initial sigmas to see how things fall out (sigma = 0.0001, 
0.001. 0.01 ... 1000).


At sigma = 1.0, the mean of the variance in tips data across simulations 
is similar for all 4 methods.  And as expected, as sigma increases or 
decreases, the mean variation among tips values also increases or 
decreases.  For each sigma value, mean levels of variation are also 
similar for 3 of the 4 methods: with rTraitCont having much larger 
relative levels of variation as sigma 1 , and much smaller relative 
levels of variation as sigma  1.   (I suspect this might be due to a 
scaling difference in the functions: sim.char, fastBM, and 
transformPhylo.sim all use ~sqrt(sigma*branchlengths) when generating 
random normal data, but I'm not certain what rTraitCont uses.)


However, the most curious finding is that for all methods, as sigma 
increases, so too does the mean trait value across the tips (and the 
converse occurs as sigma decreases).  This observation is curious to me, 
as one should not see a predictable shift in the mean under Brownian 
motion.  I thought this might be due to simulating too few data sets and 
taking the mean of the mean, but the pattern remained when 10,000 
simulated data sets were generated (though obviously, the effect was 
smaller, with mean values closer to zero). I then repeated the 
simulations for different numbers of taxa (N=16, 32, 64, 128) and the 
pattern was still present.  So I think this observation is robust.


Presently, I am at a loss to explain this observation, though I have a 
nagging suspicion that I'm missing something obvious. For instance, it 
may be related to the Central limit theorem, as increasing the number of 
simulated data sets decreased the magnitude of the mean effect across 
them. But that doesn't fully explain why the mean deviation from zero 
increases systematically with an increasing rate parameter (sigma).


Any thoughts on this would be greatly appreciated.

Best,

Dean

--
Dr. Dean C. Adams
Associate Professor
Department of Ecology, Evolution, and Organismal Biology
Department of Statistics
Iowa State University
Ames, Iowa
50011
www.public.iastate.edu/~dcadams/
phone: 515-294-3834

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Simulating BM data on phylogeny

2011-05-20 Thread Luke Harmon
I just tried this using the code below, and I can't see a trend. I calculated 
the mean tip value across the species in each simulation, and made a box plot 
over 1 simulations. Although sim.char() doesn't do this, one can simply 
draw these numbers from a multivariate normal distribution - and so their 
expected mean is zero.

rr-c(0.001, 0.01, 0.1, 1, 10, 100, 1000)
phy-birthdeath.tree(b=1, d=0, taxa.stop=20)

allMeans-matrix(nrow=1, ncol=length(rr))
for(i in 1:length(rr)) {
mm-rr[i]
ss-sim.char(phy, as.matrix(rr[i]), nsims=1)
allMeans[,i]-apply(ss, 3, FUN=mean)

}

meanmeans-apply(allMeans, 2, FUN=mean)

cc-as.numeric(allMeans)
xx-rep(rr, each=1)

plot(as.factor(xx), cc)

On May 20, 2011, at 12:32 PM, Dean Adams wrote:

 Hi all,
 
 I'm observing a curious pattern in continuous trait data simulated on a 
 phylogeny that I cannot explain. I thought I'd throw it out to the group for 
 ideas. My apologies if this was addressed in an earlier thread.
 
 I'm exploring four different R functions for simulating continuous data under 
 Brownian Motion: 'rTraitCont' in Ape, 'sim.char' in Geiger, 
 'transformPhylo.sim' from MotMot, and 'fastBM' from Liam Revell's 
 PhyTools-beta.  For these simulations I generate a tree (in this case a 
 perfectly-balanced tree) and simulate 100 data sets on the same phylogeny 
 using a particular initial BM rate parameter (sigma).  For each simulation, I 
 calculate the mean  variance of the simulated tips data, and then summarize 
 these across 100 simulations to understand some general properties of the 4 
 data-generating functions.  I'm using a wide range of initial sigmas to see 
 how things fall out (sigma = 0.0001, 0.001. 0.01 ... 1000).
 
 At sigma = 1.0, the mean of the variance in tips data across simulations is 
 similar for all 4 methods.  And as expected, as sigma increases or decreases, 
 the mean variation among tips values also increases or decreases.  For each 
 sigma value, mean levels of variation are also similar for 3 of the 4 
 methods: with rTraitCont having much larger relative levels of variation as 
 sigma 1 , and much smaller relative levels of variation as sigma  1.   (I 
 suspect this might be due to a scaling difference in the functions: sim.char, 
 fastBM, and transformPhylo.sim all use ~sqrt(sigma*branchlengths) when 
 generating random normal data, but I'm not certain what rTraitCont uses.)
 
 However, the most curious finding is that for all methods, as sigma 
 increases, so too does the mean trait value across the tips (and the converse 
 occurs as sigma decreases).  This observation is curious to me, as one should 
 not see a predictable shift in the mean under Brownian motion.  I thought 
 this might be due to simulating too few data sets and taking the mean of the 
 mean, but the pattern remained when 10,000 simulated data sets were generated 
 (though obviously, the effect was smaller, with mean values closer to zero). 
 I then repeated the simulations for different numbers of taxa (N=16, 32, 64, 
 128) and the pattern was still present.  So I think this observation is 
 robust.
 
 Presently, I am at a loss to explain this observation, though I have a 
 nagging suspicion that I'm missing something obvious. For instance, it may be 
 related to the Central limit theorem, as increasing the number of simulated 
 data sets decreased the magnitude of the mean effect across them. But that 
 doesn't fully explain why the mean deviation from zero increases 
 systematically with an increasing rate parameter (sigma).
 
 Any thoughts on this would be greatly appreciated.
 
 Best,
 
 Dean
 
 -- 
 Dr. Dean C. Adams
 Associate Professor
 Department of Ecology, Evolution, and Organismal Biology
 Department of Statistics
 Iowa State University
 Ames, Iowa
 50011
 www.public.iastate.edu/~dcadams/
 phone: 515-294-3834
 
 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Luke Harmon
Assistant Professor
Biological Sciences
University of Idaho
208-885-0346
lu...@uidaho.edu




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Simulating BM data on phylogeny

2011-05-20 Thread Dean Adams
Hi Luke,

The bit where I noticed it would be in the 'meanmeans' in your code.   
Deviation from zero is ~ 6e-5 for rate = 0.001, and increases to 1.1e-1 
as rate goes to 1000.  True it is slight, but this pattern mimics what I 
observed.

meanmeans
[1] -6.856330e-05 -7.238577e-05  2.534556e-03 -2.602753e-03  7.473694e-03
[6] -1.020982e-02  1.116674e-01

But it is quite true that the distribution of individual simulation 
means appears centered around zero for all cases (and relative to the 
distribution of individual outcomes, the 'meanmeans' trend is very small).

Thanks,

Dean

-- 
Dr. Dean C. Adams
Associate Professor
Department of Ecology, Evolution, and Organismal Biology
Department of Statistics
Iowa State University
Ames, Iowa
50011
www.public.iastate.edu/~dcadams/
phone: 515-294-3834


On 5/20/2011 3:44 PM, Luke Harmon wrote:
 I just tried this using the code below, and I can't see a trend. I 
 calculated the mean tip value across the species in each simulation, 
 and made a box plot over 1 simulations. Although sim.char() 
 doesn't do this, one can simply draw these numbers from a multivariate 
 normal distribution - and so their expected mean is zero.

 rr-c(0.001,0.01,0.1,1,10,100,1000)
 phy-birthdeath.tree(b=1,d=0,taxa.stop=20)

 allMeans-matrix(nrow=1,ncol=length(rr))
 for(i in1:length(rr)){
 mm-rr[i]
 ss-sim.char(phy,as.matrix(rr[i]),nsims=1)
 allMeans[,i]-apply(ss, 3, FUN=mean)


 }

 meanmeans-apply(allMeans, 2, FUN=mean)

 cc-as.numeric(allMeans)
 xx-rep(rr, each=1)

 plot(as.factor(xx),cc)

 On May 20, 2011, at 12:32 PM, Dean Adams wrote:

 Hi all,

 I'm observing a curious pattern in continuous trait data simulated on 
 a phylogeny that I cannot explain. I thought I'd throw it out to the 
 group for ideas. My apologies if this was addressed in an earlier thread.

 I'm exploring four different R functions for simulating continuous 
 data under Brownian Motion: 'rTraitCont' in Ape, 'sim.char' in 
 Geiger, 'transformPhylo.sim' from MotMot, and 'fastBM' from Liam 
 Revell's PhyTools-beta.  For these simulations I generate a tree (in 
 this case a perfectly-balanced tree) and simulate 100 data sets on 
 the same phylogeny using a particular initial BM rate parameter 
 (sigma).  For each simulation, I calculate the mean  variance of the 
 simulated tips data, and then summarize these across 100 simulations 
 to understand some general properties of the 4 data-generating 
 functions.  I'm using a wide range of initial sigmas to see how 
 things fall out (sigma = 0.0001, 0.001. 0.01 ... 1000).

 At sigma = 1.0, the mean of the variance in tips data across 
 simulations is similar for all 4 methods.  And as expected, as sigma 
 increases or decreases, the mean variation among tips values also 
 increases or decreases.  For each sigma value, mean levels of 
 variation are also similar for 3 of the 4 methods: with rTraitCont 
 having much larger relative levels of variation as sigma 1 , and 
 much smaller relative levels of variation as sigma  1.   (I suspect 
 this might be due to a scaling difference in the functions: sim.char, 
 fastBM, and transformPhylo.sim all use ~sqrt(sigma*branchlengths) 
 when generating random normal data, but I'm not certain what 
 rTraitCont uses.)

 However, the most curious finding is that for all methods, as sigma 
 increases, so too does the mean trait value across the tips (and the 
 converse occurs as sigma decreases).  This observation is curious to 
 me, as one should not see a predictable shift in the mean under 
 Brownian motion.  I thought this might be due to simulating too few 
 data sets and taking the mean of the mean, but the pattern remained 
 when 10,000 simulated data sets were generated (though obviously, the 
 effect was smaller, with mean values closer to zero). I then repeated 
 the simulations for different numbers of taxa (N=16, 32, 64, 128) and 
 the pattern was still present.  So I think this observation is robust.

 Presently, I am at a loss to explain this observation, though I have 
 a nagging suspicion that I'm missing something obvious. For instance, 
 it may be related to the Central limit theorem, as increasing the 
 number of simulated data sets decreased the magnitude of the mean 
 effect across them. But that doesn't fully explain why the mean 
 deviation from zero increases systematically with an increasing rate 
 parameter (sigma).

 Any thoughts on this would be greatly appreciated.

 Best,

 Dean

 -- 
 Dr. Dean C. Adams
 Associate Professor
 Department of Ecology, Evolution, and Organismal Biology
 Department of Statistics
 Iowa State University
 Ames, Iowa
 50011
 www.public.iastate.edu/~dcadams/ 
 http://www.public.iastate.edu/%7Edcadams/
 phone: 515-294-3834

 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

 Luke Harmon
 Assistant Professor
 Biological Sciences
 University of Idaho
 208-885-0346
 

Re: [R-sig-phylo] Simulating BM data on phylogeny

2011-05-20 Thread Joe Felsenstein

Ted wrote --

 Based on zillions of BM simulations we have done with DOS PDSIMUL,  
 you should never see a shift in mean value (versus starting value  
 at root of tree), unless you are intentionally modeling a trend.   
 Something must be wrong.

in response to Dean Adams:

 For these simulations I generate a tree (in this
 case a
 perfectly-balanced tree) and simulate 100 data sets on the same
 phylogeny using a particular initial BM rate parameter (sigma).
...
 However, the most curious finding is that for all methods, as sigma
 increases, so too does the mean trait value across the tips (and
 the
 converse occurs as sigma decreases). This observation is curious to
 me,
 as one should not see a predictable shift in the mean under
 Brownian
 motion.


Is it possible that the simulation is doing Brownian Motion on some
other scale, such as a log scale?   If one does BM on the logs and then
looks at the original phenotype scale, you *would* expect that as the
variance among species increases, so does the mean of the species
means.

Joe

Joe Felsenstein  j...@gs.washington.edu
  Dept of Genome Sciences and Dept of Biology, Univ. of Washington,  
Box 5065, Seattle Wa 98195-5065


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo