Re: [R-sig-phylo] fitContinuous in geiger

Alejandro Gonzalez Thu, 19 May 2011 01:35:52 -0700

Hi,

As a followup on the questions regarding estimates of phylogenetic signal, I 
was wondering if beta values could be meaningfully compared, for example if 
estimated for different traits on the same phylogeny. Would it be correct to 
assume that, if a Brownian motion model of evolution provides an accurate fit 
to the data, the beta values would provide information on the rate of 
phenotypic evolution of the trait? Would these values need to be standardized 
somehow, given that the sigma value scales with trait value (as pointed out by 
Carl)?


Cheers

Alejandro


On 19, May 2011, at 2:29 AM, Liam J. Revell wrote:

> Hi Annemarie,
> 
> Positive log-likelihoods are not a problem.  The log-likelihood is calculated 
> by summing the log probability densities, which come from a function that 
> integrates to 1.0.  Thus, if the variance of this distribution is small, the 
> value of the function will be large (i.e., greater than 1.0).
> 
> The phenomenon of decreasing mean lambda when you increase the scale (i.e., 
> multiply by 10 or 100) is probably due to bounds on beta (in the lambda 
> model, sigma^2) in fitContinuous().  The default upper bound is 20.  You can 
> change this by executing:
> 
> > fitContinuous(...,bounds=list(beta=c(0,1000)) # or something
> 
> once you fix this issue, the mean lambda for any scale of your data vector 
> should be the same.
> 
> You can also try my function for estimating phylogenetic signal: 
> phylosig(...,method="lambda") at URL: 
> http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/ which does not have this 
> issue.
> 
> Good luck.
> 
> - Liam
> 
> -- 
> Liam J. Revell
> University of Massachusetts Boston
> web: http://faculty.umb.edu/liam.revell/
> email: liam.rev...@umb.edu
> blog: http://phytools.blogspot.com
> 
> On 5/18/2011 3:50 AM, Annemarie Verkerk wrote:
>> Hi all,
>> 
>> I’m having some trouble with the function fitContinuous in the geiger
>> library. I'm using fitContinuous to estimate a lambda score as an
>> indication for the presence of phylogenetic signal. As a sidenote, I'm
>> doing this with language data - so language trees based on shared
>> vocabulary and it is a linguistic typological trait that I'm trying to
>> get estimates of lambda of. Another sidenote is that I have similar
>> problems in BayesTraits but no problems using phylosignal in picante for
>> estimating lambda.
>> 
>> At the moment, there are 14 taxa in my sample. I have a tree set of 1000
>> trees. The first data set + trees are attached. My data values are all
>> values between 0 and 1, basically things like '0.326547'. (This is
>> because they come from a principal components analysis; they are scores
>> on the first principal component that explains about 80% of the
>> variation.) I've been using capped values with two numbers after the
>> period just for easy usage, so '0.33'. However, the results that I get
>> are strange.
>> 
>> My first dataset looks like this:
>> 
>> language value
>> t1 0.32
>> t4 0.52
>> t6 0.95
>> t9 0.75
>> t10 0.77
>> t12 0.46
>> t14 0.61
>> t2 0.35
>> t3 0.29
>> t5 0.25
>> t7 0.89
>> t8 0.88
>> t11 0.79
>> t13 0.35
>> 
>> Then I do the fitContinuous analysis over my sample of trees (1000
>> trees) and these are my scores:
>> 
>> median of lambda:
>> [1] 1
>> mean of lambda:
>> [1] 0.9999985
>> sd of lambda
>> [1] 4.60849e-05
>> 
>> So: almost all values of lambda are 1.
>> 
>> median of log-likelyhood
>> [1] 5.206887
>> mean of log-likelyhood
>> [1] 5.210839
>> sd of log-likelyhood
>> [1] 0.4215943
>> 
>> The log-likelyhood is positive? That is very strange…? These results
>> basically make it seem as if the algorithm has crashed.
>> 
>> Then, I multiply my values with 100:
>> 
>> language value
>> t1 32
>> t4 52
>> t6 95
>> t9 75
>> t10 77
>> t12 46
>> t14 61
>> t2 35
>> t3 29
>> t5 25
>> t7 89
>> t8 88
>> t11 79
>> t13 35
>> 
>> results:
>> 
>> median lambda:
>> [1] 0.9874361
>> mean lambda:
>> [1] 0.9839095
>> sd lambda:
>> [1] 0.01622255
>> 
>> median log-likelihood:
>> [1] -65.66331
>> mean log-likelihood:
>> [1] -65.73675
>> sd log-likelihood:
>> [1] 1.716778
>> 
>> Now the number of lambda scores of '1' is lower, although it is not
>> really gone yet, there are still around a 200-300 instances of '1'. The
>> log-likelyhood is now -65, so at least it's negative.
>> 
>> When I multiply my original data points with 1000, this is my data set:
>> 
>> value
>> language value
>> t1 320
>> t4 520
>> t6 950
>> t9 750
>> t10 770
>> t12 460
>> t14 610
>> t2 350
>> t3 290
>> t5 250
>> t7 890
>> t8 880
>> t11 790
>> t13 350
>> 
>> results:
>> 
>> median lambda:
>> [1] 0.8640076
>> mean lambda:
>> [1] 0.8561964
>> sd lambda:
>> [1] 0.05001523
>> 
>> median log-likelihood:
>> [1] -2055.763
>> mean log-likelihood
>> [1] -2067.052
>> sd log-likelihood
>> [1] 213.44
>> 
>> There are no no more lambda scores of ‘1’ in the data, but the log
>> likelood is a really big number, and I'm not sure what that would mean
>> in this context?
>> 
>> So, even though the range of variation stays exactly the same with these
>> multiplications, there are quite important differences between the
>> results these three sets of data give me. It was suggested to me that
>> the algorithm might be doing something to my data values, for instance
>> cap them, round them off or not taking into account certain decimals,
>> and that might be the reason for these different results. Would anyone
>> have any idea about why this happens and how I can deal with it in an
>> informative way?
>> 
>> Thanks so much for any help that you might be able to offer,
>> Annemarie Verkerk
>> 
>> 
>> 
>> 
>> _______________________________________________
>> R-sig-phylo mailing list
>> R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> 
> _______________________________________________
> R-sig-phylo mailing list
> R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

__________________________________

Alejandro Gonzalez Voyer

Post-doc

Estación Biológica de Doñana
Consejo Superior de Investigaciones Científicas (CSIC)
Av Américo Vespucio s/n
41092 Sevilla
Spain

Tel: + 34 - 954 466700, ext 1749

E-mail: alejandro.gonza...@ebd.csic.es

Web page: https://docs.google.com/View?id=dfs328dh_14gwwqsxcg

_______________________________________________
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

Re: [R-sig-phylo] fitContinuous in geiger

Reply via email to