Re: [R-sig-phylo] Model-Selection vs. Finding Models that "Fit Well"

Carl Boettiger Thu, 20 Jan 2011 10:48:25 -0800

Hi David, List,

I think you make a good point.  After all, the goal isn't to match the
pattern but to match the process.  If we just wanted to match the data we'd
use the most complicated model we could make (or some machine learning
pattern) and dispense with AIC.

If a model has errors that are normally distributed from a path, than
minimizing R^2 for the model is the same as maximizing likelihood, so I'm
afraid I don't understand what is meant by not having a goodness-of-fit.
Isn't likelihood a measure of fit?

If we consider very stochastic models that have a large range of possible
outcomes, no outcome is very likely, and we don't expect a good fit.  If we
had replicates we might hope to compare the distributions of possible
outcomes.

I don't see getting around this with a simple test.  I think Brian's example
is very instructive, it depends why we are trying to fit the model in the
first place (to go back to Levins 1968).  If we want to learn about optima
and strengths of selection, we won't learn anything by fitting a BM model to
the data, as it has no parameters that represent these things.  However,
Brian's two-rate BM fit will still test that the rates of diversification
don't differ substantially between the peaks (or conversely, if one peak had
very weak stablizing selection, this would be detected as a difference in
Brownian rates between the clades)

If our goal wasn't to compare parameter values but to make predictions (for
instance, estimate trait values missing taxa), then a purely goodness-of-fit
approach might be better (and some machine learning algorithm could probably
out-perform any of the simple mechanistic models).  I think it may be
difficult to really answer David's question without an example of what
hypothesis we are really after.  Perhaps I have missed something?

-Carl

On Thu, Jan 20, 2011 at 9:27 AM, David Bapst <dwba...@uchicago.edu> wrote:

> Hello all,
> I'd like to pose a question to this group, as a bit of topical
> discussion. I apologize in advance if I should mangle a concept.
>
> In many model-based PCMs and some other analyses (such as paleoTS), we
> fit models to data by finding the ML estimates of the parameters
> associated, calculate the maximum support of each model and than
> compare between models with differing parameters using an information
> criterion (AICc being probably the most used). Akaike weights can be
> calculated if we want to consider the relative fit between our models.
> This is contrary to traditional statistics, where alternative
> hypotheses are tested against some null hypothesis. Obviously, the
> later approach has proven to be thorny because rejecting some null
> hypotheses are very difficult (such as a random walk) and some
> situations truly lack a clear null model.
>
> Recently, I have heard the opinion expressed from workers of disparate
> fields (philosophy, ecology, etc.) that model-choosing methods may
> choose the best model, but with no idea of whether any of the models
> considered "fit well" to the data or not. In other words, we may have
> fit models A-D, and the best model may have been model C, but none of
> the models compared could describe the 'true' process underlying the
> data at all.
>
> This view gives me mixed feelings. Certainly, if we are using a
> model-selection approach, we should attempt the range of models that
> make sense for our data, and should particularly include that set of
> simplistic models that we may accept as the most observed process
> (Brownian motion and Ornstein-Uhlenbeck with one optima, perhaps, in
> analyses of trait evolution). Of course, we cannot include models that
> we haven't even considered or are analytically intractable. That's a
> fundamental limitation of science, however, not model-selection based
> analyses.
>
> This counter-argument did not seem to satisfy the others, who still
> wanted a measure of absolute fit, "like an R-squared". Now, perhaps
> I'm confused, but isn't R-squared technically a relative measure of
> fit between a linear model and a random scatter? I suppose the maximum
> support for a model is a measure of absolute fit, but it's not useful
> or interpretable unless I'm comparing it to the support for some other
> model.
>
> So, it seems like the desire for a measure of absolute fit is not
> well-founded, but maybe I'm wrong. Is there something more we can do
> to show how that the models we've picked aren't arbitrary? Opinions?
> -Dave Bapst, UChicago Geosci
>
> _______________________________________________
> R-sig-phylo mailing list
> R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>

-- 
Carl Boettiger
UC Davis
http://www.carlboettiger.info/

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] Model-Selection vs. Finding Models that "Fit Well"

Reply via email to