Re: [R-sig-phylo] Model-Selection vs. Finding Models that Fit Well

2011-01-31 Thread Luke Harmon
I agree with Dave here. White noise has two parameters, mean and variance, and 
- to me - is an interesting model to test. But I'm not sure it should be 
considered as a baseline.

One can link Brownian motion and white noise through the Ornstein-Uhlenbeck 
model - BM is OU with alpha (constraint) parameter equal to zero, and WN is OU 
with infinite alpha.

lh
On Jan 30, 2011, at 3:18 PM, David Bapst wrote:

 Florian-
 Doesn't white noise have two parameters, mean and variance, and thus is just
 as complex as the Brownian Motion model? I guess LRT could be done against
 both.
 
 That said, it isn't clear to me that what it means for the White Noise to
 fit best, as WN is interpretable as an evolutionary scenario, so it doesn't
 seem a very clean null. If WN fits best in a set of models, the trait
 examined must be exceedingly low in phylogenetic signal and thus the trait
 must be evolving under one of the scenarios which can produce low signal (OU
 with high rates and strong attraction, and/or evolution where descendant
 values are not a function of ancestral values).
 -Dave
 
 On Fri, Jan 28, 2011 at 5:31 PM, Florian Boucher 
 floflobouc...@gmail.comwrote:
 
 Hi David and list,
 
 just a quick comment on one of your questions :
 for quantitative traits on a phylogeny you can compare your best model to
 the white noise model implemented in geiger, which assumes that your
 traits are drawn from a normal distribution.
 This last model would be the baseline model Ted evoked in his post.
 
 I hope it helps...
 
 
 Florian Boucher
 PhD student, Laboratoire d'Ecologie Alpine,
 Grenoble, France
 
 2011/1/28 David Bapst dwba...@uchicago.edu
 
 Hello all,
 
 Apologies for leaving the replies to get cold for a week, but now I
 finally have some time to respond.
 
 On Thu, Jan 20, 2011 at 12:17 PM, Brian O'Meara omeara.br...@gmail.com
 wrote:
 I think considering model adequacy is something that would be useful to
 do
 and is not done much now. One general way to do this is to simulate
 under
 your chosen model and see if the real data look very different from
 the
 simulated data. For example, I might try a one rate vs. a two rate
 Brownian
 motion model and find the latter fits better. If the actual true model
 is
 an
 OU model with two very distinct peaks and strong selection, which is
 not
 in
 my model set, I'll find that my simulated data under the two rate
 Brownian
 model may look very different from my actual data, which will be fairly
 bimodal. Aha, my model is inadequate. [but then what -- keep adding new
 models, just report that your model is inadequate, ...?]
 
 Certainly, data exploration is a step that cannot be skipped; I've
 found that Ackerly's traitgrams work well for me for visualizing my
 data, although I know some people who find them simply confusing
 (particularly my traitgrams, as they have fossil taxa all over them).
 
 Of course, you need a method for evaluating how similar the data
 look.
 There's been some work on this in models for tree inference using
 posterior
 predictive performance (work by Jonathan Bollback and Jeremy Brown come
 to
 mind) or using other approaches (some of Peter Waddell's work), but it
 hasn't really taken off yet. It'd be easy to implement such approaches
 in
 R
 for comparative methods given capabilities in ape, Geiger, and other
 packages.
 
 I don't entirely follow, but I'll look in to the posterior predictive
 performance work you mention. But wouldn't how 'similar the data look'
 would all be a function of what we want to look at, would it not?
 
 On Thu, Jan 20, 2011 at 12:27 PM,  tgarl...@ucr.edu wrote:
 One quick comment.  In many cases what you can do is also fit a model
 with no independent variables.  It, too, will have a likelihood, and can
 be
 used as a baseline to argue whether your best model is actually any
 good.  You could then do a maximum likelihood ratio test of your best
 model
 versus your baseline model.  If the baseline model does not have a
 significant lack of fit by a LRT (e.g., P not  0.05), then your best
 model
 arguably isn't of much use.
 
 Cheers,
 Ted
 
 That seems like a pretty good idea! How would we go about doing that
 an example case, such as for a continuous trait on a phylogeny?
 
 On Thu, Jan 20, 2011 at 2:00 PM, Nick Matzke mat...@berkeley.edu
 wrote:
 If one is interested in absolute goodness of fit, rather than model
 comparison (which model fits best, which might not be useful if you are
 worried that all your models are horrible), wouldn't cross-validation
 be
 a
 good technique? I.e. leave out one tip, calculate the model and the
 estimated node values from the rest, then put an estimate and
 uncertainty
 on
 the tip and see how often it matches the observed value.  Repeat for
 all
 observations...
 
 Would jack-knifing data like that be appropriate for the continuous
 trait analyses? That would seem to deal with whether we have enough
 data to distinguish the models at hand (which is still important), but
 not 

Re: [R-sig-phylo] Model-Selection vs. Finding Models that Fit Well

2011-01-31 Thread Cecile Ane
I don't find the white noise to be any good evolutionary scenario: it's 
nowhere continuous. It just reduces to the assumption of normal, 
independent observations at the tips. Nothing fancy, then :)

Cecile.

On 01/31/11 11:53, Luke Harmon wrote:

I agree with Dave here. White noise has two parameters, mean and variance, and - to me - 
is an interesting model to test. But I'm not sure it should be considered as a 
baseline.

One can link Brownian motion and white noise through the Ornstein-Uhlenbeck 
model - BM is OU with alpha (constraint) parameter equal to zero, and WN is OU 
with infinite alpha.

lh
On Jan 30, 2011, at 3:18 PM, David Bapst wrote:

...
--
Cecile Ane
Assistant Professor, University of Wisconsin - Madison
www.stat.wisc.edu/~ane/

CALS statistical consulting lab:
www.cals.wisc.edu/calslab/stat.html

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Model-Selection vs. Finding Models that Fit Well

2011-01-31 Thread tgarland
Hi Joe et al.,
 
One point for clarification and your further thoughts.
The way parameterize the OU process in Lavin et al. (2008) it is a
value of zero (not infinity) that gives a start phylogeny with
contemporaneous tips.  Sometimes the ML estimate of d (what we call
it) goes to zero, but more often it may go very small but not zero. 
In either case, it seems to me you can do a LRT versus a star with one
d.f.
 
Cheers,
Ted
 
 Original message 

  Date: Mon, 31 Jan 2011 13:02:38 -0800
  From: Joe Felsenstein j...@gs.washington.edu
  Subject: Re: [R-sig-phylo] Model-Selection vs. Finding Models that
  Fit Well
  To: David Bapst dwba...@uchicago.edu
  Cc: R Sig Phylo Listserv r-sig-phylo@r-project.org, Ted Garland
  tgarl...@ucr.edu

  
  David Bapst and Cecile Ane noted that
  
   On Mon, Jan 31, 2011 at 12:45 PM, Cecile Ane a...@stat.wisc.edu
  wrote:
I don't find the white noise to be any good evolutionary
  scenario: it's
nowhere continuous. It just reduces to the assumption of
  normal,
   independent
observations at the tips. Nothing fancy, then :)
  
   I think it is a bit problematic that WN could fit equally well to
  a
   situation with zero heritability and an OU1 scenario with
  infinite alpha. I
   guess one could check if there is any relationship between the
  magnitude of
   evolutionary change and distance from the mean (as it is
  generally expected
   under OU that changes are more minor closer to the optima). But
  perhaps even
   that relationship goes away as alpha approaches infinity. It
  would also be
   difficult to assess, because phylogenetic signal would be too low
  for
   ancestral state reconstruction to work with confidence.
  
  Lack of continuity does not worry me: if we have (say) discrete
  generations
  as in annual plants, nothing is continuous anyway. And our
  observations
  are rarely closely-enough spaced to be sensitive to that issue
  anyway.
  
  The problem with WN (i.e. tips i.i.d. and from multivariate normal)
  is that,
  although it connects to OU and to other models, in each case that
  is in a
  limit as some parameter goes off to infinity. In those cases I
  think WN
  as a null hypothesis is going to be difficult, as it will not be
  contained within the alternative hypotheses in a way that allows a
  Likelihood Ratio Test. And I suspect that it would give trouble
  also in
  a Bayesian framework, though that is just a suspicion.
  
  Joe
  
  Joe Felsenstein j...@gs.washington.edu
   Department of Genome Sciences and Department of Biology,
   University of Washington, Box 355065, Seattle, WA 98195-5065 USA

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Model-Selection vs. Finding Models that Fit Well

2011-01-31 Thread Joe Felsenstein

Ted said --

 One point for clarification and your further thoughts.
 The way parameterize the OU process in Lavin et al. (2008) it is a
 value of zero (not infinity) that gives a start phylogeny with
 contemporaneous tips.  Sometimes the ML estimate of d (what we call
 it) goes to zero, but more often it may go very small but not zero. 
 In either case, it seems to me you can do a LRT versus a star with one
 d.f.

Can you?  Is the value of zero contained within the range of possible
values, or at its extreme?  I think for LRT you need contained within.

Joe

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Model-Selection vs. Finding Models that Fit Well

2011-01-31 Thread Simon Blomberg
The link between BM and WN is even closer than that: WN is  the 
derivative of a BM process. Now, BM is nowhere differentiable, so in the 
usual sense, WN doesn't really exist. However, it can be approximated by 
simulation.


Cheers,

Simon.

On 01/02/11 03:53, Luke Harmon wrote:

I agree with Dave here. White noise has two parameters, mean and variance, and - to me - 
is an interesting model to test. But I'm not sure it should be considered as a 
baseline.

One can link Brownian motion and white noise through the Ornstein-Uhlenbeck 
model - BM is OU with alpha (constraint) parameter equal to zero, and WN is OU 
with infinite alpha.

lh
On Jan 30, 2011, at 3:18 PM, David Bapst wrote:

   

Florian-
Doesn't white noise have two parameters, mean and variance, and thus is just
as complex as the Brownian Motion model? I guess LRT could be done against
both.

That said, it isn't clear to me that what it means for the White Noise to
fit best, as WN is interpretable as an evolutionary scenario, so it doesn't
seem a very clean null. If WN fits best in a set of models, the trait
examined must be exceedingly low in phylogenetic signal and thus the trait
must be evolving under one of the scenarios which can produce low signal (OU
with high rates and strong attraction, and/or evolution where descendant
values are not a function of ancestral values).
-Dave

On Fri, Jan 28, 2011 at 5:31 PM, Florian Boucherfloflobouc...@gmail.comwrote:

 

Hi David and list,

just a quick comment on one of your questions :
for quantitative traits on a phylogeny you can compare your best model to
the white noise model implemented in geiger, which assumes that your
traits are drawn from a normal distribution.
This last model would be the baseline model Ted evoked in his post.

I hope it helps...


Florian Boucher
PhD student, Laboratoire d'Ecologie Alpine,
Grenoble, France

2011/1/28 David Bapstdwba...@uchicago.edu

   

Hello all,

Apologies for leaving the replies to get cold for a week, but now I
finally have some time to respond.

On Thu, Jan 20, 2011 at 12:17 PM, Brian O'Mearaomeara.br...@gmail.com
wrote:
 

I think considering model adequacy is something that would be useful to
   

do
 

and is not done much now. One general way to do this is to simulate
   

under
   

your chosen model and see if the real data look very different from
   

the
   

simulated data. For example, I might try a one rate vs. a two rate
   

Brownian
 

motion model and find the latter fits better. If the actual true model
   

is
   

an
 

OU model with two very distinct peaks and strong selection, which is
   

not
   

in
 

my model set, I'll find that my simulated data under the two rate
   

Brownian
 

model may look very different from my actual data, which will be fairly
bimodal. Aha, my model is inadequate. [but then what -- keep adding new
models, just report that your model is inadequate, ...?]
   

Certainly, data exploration is a step that cannot be skipped; I've
found that Ackerly's traitgrams work well for me for visualizing my
data, although I know some people who find them simply confusing
(particularly my traitgrams, as they have fossil taxa all over them).

 

Of course, you need a method for evaluating how similar the data
   

look.
   

There's been some work on this in models for tree inference using
   

posterior
 

predictive performance (work by Jonathan Bollback and Jeremy Brown come
   

to
 

mind) or using other approaches (some of Peter Waddell's work), but it
hasn't really taken off yet. It'd be easy to implement such approaches
   

in
   

R
 

for comparative methods given capabilities in ape, Geiger, and other
packages.
   

I don't entirely follow, but I'll look in to the posterior predictive
performance work you mention. But wouldn't how 'similar the data look'
would all be a function of what we want to look at, would it not?

On Thu, Jan 20, 2011 at 12:27 PM,tgarl...@ucr.edu  wrote:
 

One quick comment.  In many cases what you can do is also fit a model
   

with no independent variables.  It, too, will have a likelihood, and can
 

be
   

used as a baseline to argue whether your best model is actually any
good.  You could then do a maximum likelihood ratio test of your best
 

model
   

versus your baseline model.  If the baseline model does not have a
significant lack of fit by a LRT (e.g., P not  0.05), then your best
 

model
   

arguably isn't of much use.
 

Cheers,
Ted
   

That seems like a pretty good idea! How would we go about doing that
an example case, such as for a continuous trait on a phylogeny?

On Thu, Jan 20, 2011 at 2:00 PM, Nick Matzkemat...@berkeley.edu
 

wrote:
   

If one is interested in 

Re: [R-sig-phylo] Simulate discrete characters on tree constrained tophylogenetic signal K

2011-01-31 Thread Scott Chamberlain
Dear Enrico/Emmanuel, 


Actually, I just realized I need continuous traits, not discrete. Sorry about 
that. Thanks so much for pointing me towards rTrait(using Cont). 


Would it be right that a one optima OU model with rTraitCont would have theta 
set to one single value? And a two optima model would have theta at two 
possible values for each branch? Etc.? 


Enrico, thanks very much for that thought. So I am simulating trees and traits 
on trees, and then calculating K using the phylosignal function. Thus, K is 
implicitly manipulated, instead of explicitly manipulated. 


Scott
On Monday, January 31, 2011 at 3:28 AM, Enrico Rezende wrote:

 Scott,
 one important caveat that should be added is that, even when one knows 
 and controls the parameters of the evolutionary model, the resulting 
 dataset will have the expected amount of signal only ON AVERAGE, given 
 the inherent stochasticity of these models. For instance, if you want K 
 = 1 and assume Brownian motion character evolution on a given tree, you 
 may still get data that have a K ranging between, say, 0.6 and 1.4 (with 
 a mean K = 1, of course). Thus, you might want to quantify K after the 
 simulations to ensure that you're obtaining the amount of signal desired.
 
 Ah yes, this applies primarily to quantitative traits, for qualitative 
 characters things change a bit.
 
 Enrico
 
 
 
 Emmanuel Paradis escribió:
 
  On Fri, 28 Jan 2011 17:20:57 -0600 Scott Chamberlain 
  myrmecocys...@gmail.com wrote:
  
   Dear R community,
   I would like to simulate discrete characters on a randomly generated 
   tree. However, I would like to create different sets of trees and 
   associated characters at certain levels of phylogenetic signal. 
   
   
  
  If you know the relationship between phylogenetic signal (Blomberg et 
  al's K in your case) and the parameters of the model used to simulate 
  the characters, this would be easy. But I don't think this 
  relationship is obvious, at least for discrete characters. Keep in 
  mind also that, for a given model and parameters of evolution, the 
  quantity of phylogenetic signal depends on the tree.
  
  
   Can someone point me in the right direction?
   
   I am familiar with sim.char from geiger package for simulating 
   characters, and rcoal for simulating the ultrametric trees that I 
   want, but I don't know how I could, if at all, simulate characters 
   that have a particular a priori set phylogenetic signal. 
   
   
  
  BTW, instead of rcoal() you may wish to use rbdtree(). And if you want 
  more flexibility for simulating the characters, see rTraitDisc() in ape.
  
  Best,
  
  Emmanuel
  
  
   As a simple example, the following example gets phylogenetic signal 
   (K) that is all over the map in different runs:
   
   
   q - list(rbind(c(-.5, .5), c(.5, -.5)))
   traits - sim.char(tree, q, model = discrete, n = 1)
   traits_ - as.data.frame(traits[,,1])
   traitsvec - traits_[,1]
   names(traitsvec) - rownames(traits_)
   phylosignal(traitsvec, tree, reps = 100)
   
   
   
   
   
   
   
   Sincerely,
   
   Scott Chamberlain
   Rice University, EEB Dept.
   
   
   
[[alternative HTML version deleted]]
   
   ___
   R-sig-phylo mailing list
   R-sig-phylo@r-project.org
   https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
   
   
  
  ___
  R-sig-phylo mailing list
  R-sig-phylo@r-project.org
  https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
  
  
 
 
 -- 
 ***
 Enrico L. Rezende
 
 Departament de Genètica i de Microbiologia
 Facultat de Biociències, Edifici Cn
 Universitat Autònoma de Barcelona
 08193 Bellaterra (Barcelona)
 SPAIN
 
 Telephone: +34 93 581 4705
 Fax: +34 93 581 2387
 E-mail: enrico.reze...@uab.cat
 
 ***
 
 
 





[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo