Re: [R-sig-phylo] fitContinuous in geiger

Liam J. Revell Mon, 23 May 2011 12:20:50 -0700

Hi Annemarie,

The only thing I would add to Carl's comment is that the theoreticallimit of lambda is not 1.0, but can be found (for an ultrametric tree)by computing:


> C<-vcv.phylo(tree)
> maxLambda<-max(C)/max(C[upper.tri(C)])

You can then change the boundary condition for fitContinuous():

> fit<-fitContinuous(phy=tree,x,model="lambda",bounds=list(alpha=c(0,maxLambda)))

My function phylosig() automatically finds the upper boundary conditionand uses this for the optimization:

>source("http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/phylosig/v0.3/phylosig.R";)

> fit<-phylosig(tree,x,method="lambda",test=TRUE)

Best of luck.

- Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 5/23/2011 2:46 PM, Carl Boettiger wrote:

Hi Annemarie,

No problem, tried to give some answers below.


On Mon, May 23, 2011 at 8:05 AM, Annemarie Verkerk
<annemarie.verk...@mpi.nl <mailto:annemarie.verk...@mpi.nl>> wrote:

    Dear Carl, Liam, and others,

    thanks for your explanation of what went wrong in the fitContinuous
    calculations. I set beta to a large number (10000000000) in order to
    stop it from reading the maximum value. Then, I got exactly the same
    results for lambda with the non-multiplied and the multiplied data.

    Okay, I still have two more problems - probably they will sound
    stupid to you, but I am still very much a newbie to this and if it
    suffices just to point me to other sources where this has already
    been discussed please do so!

    the log likelyhood: between the non-multiplied and the multiplied
    data, there is still a difference. Liam, you write 'if the variance
    of this distribution is small, the value of the function will be
    large (i.e., greater than 1.0)'. However, the variance between the
    non-multiplied and the multiplied data is exactly the same. So why
    should log likelyhood values change when data is multiplied? Why do
    I get a normal looking value (around -200) (with the beta scale set
    for a very large scale) for the multiplied data? And why does the
    log likelyhood become so big (-2000) when the initial maximum beta
    value of 20 is reached if the scale is (0,20)?


Liam is referring to the variance of the likelihood distribution, not
the variance of the traits.  If the diversification rate is high, then
any particular outcome is very improbable, so its probability density
has high variance and corresponding low value for the prob density at
any particular value.  Hence the large negative log-likelihood for large
beta.  (Recall log lik around -2000 means a density of exp(-2000), which
is very near zero, illustrating why we need to use logs!)


    the lambda scores: now, my lambda scores for the non-multiplied and
    the multiplied data are the same. However, most of them (999 out of
    1000) are '1'. To my understanding, '0' and '1' are theoretical
    limits for lambda that one normally does not reach. So I'm afraid
    I'm still unsure why this happens, and what it means.


You raise an important point here.  Just because they are the boundary
values, does not imply that these theoretical limits are uncommon -- in
fact one may expect to hit the limits more often than any other value.
Numerical optimization will often push estimates of a parameter all the
way to its boundary.  This just means that increasing (deceasing) the
parameter value increases the likelihood, so it keeps doing so until it
can go no further.  This can result from, but does not necessarily
imply, that the model is inappropriate (it is also particularly common
for small numbers of taxa, for instance), and can be more common on
relatively flat likelihood surfaces.

So I guess my short answer is "this isn't a problem" and my longer
answer is "be suspicious whenever you get back boundary estimates, and
consider bootstrapping."


HTH,

Carl

    I hope you don't mind this new response to the thread.

    With kind regards,
    Annemarie


    Liam J. Revell wrote:

        Hi Annemarie,

        Positive log-likelihoods are not a problem.  The log-likelihood
        is calculated by summing the log probability densities, which
        come from a function that integrates to 1.0.  Thus, if the
        variance of this distribution is small, the value of the
        function will be large (i.e., greater than 1.0).

        The phenomenon of decreasing mean lambda when you increase the
        scale (i.e., multiply by 10 or 100) is probably due to bounds on
        beta (in the lambda model, sigma^2) in fitContinuous().  The
        default upper bound is 20.  You can change this by executing:

         > fitContinuous(...,bounds=list(beta=c(0,1000)) # or something

        once you fix this issue, the mean lambda for any scale of your
        data vector should be the same.

        You can also try my function for estimating phylogenetic signal:
        phylosig(...,method="lambda") at URL:
        http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/
        <http://anolis.oeb.harvard.edu/%7Eliam/R-phylogenetics/> which
        does not have this issue.

        Good luck.

        - Liam


    --
    Annemarie Verkerk, MA
    Evolutionary Processes in Language and Culture (PhD student)
    Max Planck Institute for Psycholinguistics
    P.O. Box 310, 6500AH Nijmegen, The Netherlands
    +31 (0)24 3521 185 <tel:%2B31%20%280%2924%203521%20185>
    http://www.mpi.nl/research/research-projects/evolutionary-processes

    _______________________________________________
    R-sig-phylo mailing list
    R-sig-phylo@r-project.org <mailto:R-sig-phylo@r-project.org>
    https://stat.ethz.ch/mailman/listinfo/r-sig-phylo




--
Carl Boettiger
UC Davis
http://www.carlboettiger.info/


_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] fitContinuous in geiger

Reply via email to