Re: [R-sig-phylo] issue with reading matrix in R package DispRity

2021-12-06 Thread Marguerite Butler
Hi Oliver,
You have semicolons ; in your cells.  csv is "comma separated format". It
is using "," as the delimiter. Probably what is happening is the ";" is
interpreted as another delimiter in R, so you have a 2 column vector in
each cell. I would guess that htis will be read in as a list.

Why do you have two values in each cell? If they are important I would
suggest splitting them into two columns so that you only have one value per
cell.

Otherwise you can read in the list, and flatten it with code. But it would
probably be easier to just produce the .csv in the correct way.

An easy way to do this is to read in your .csv file with a word processor
and just do a search on ";" and replace with ",".   Then you should get 6
columns instead of three columns with two values each.

Marguerite

Marguerite



On Mon, Dec 6, 2021 at 2:08 PM Oliver Betz 
wrote:

>
> Dear all:
>
> I have an issue in correctly transferring the attached data frame to a
> matrix to be processed in the R package DispRity.
>
> I am running the following script that (1) reads in the attached
> dataframe from Excel, (2) tansfers it to a matrix X and (3) uses the
> column "subgroup" to define the two groups I want to work on.
>
> Her is the script I am using:
>
> #___
>
> library (dispRity)
>
> #(1) Reading dataframe
> Data<-read.csv2("test.csv", row.names=1, header = T,
> fileEncoding="UTF-8-BOM")
>
> #(2) converts dataframe into a matrix
> X<-data.matrix(Data)
>
> #(3)define subsets
> custom.subsets(X[, c(1,2)], group = as.factor(X[, 3]))
>
> #___
>
> However, once I have run the third line, I am getting an error message
> such as:
>
> Error: group argument must be either a "list", a "matrix", a
> "data.frame" or a "phylo" object.
>
> 
>
> If I change the third line into: custom.subsets(dat[, c(2,3)], group =
> as.factor(dat[, 4])), the error message is as follows:
>
> Error in X[, 4] : subscript out of bounds
> ___
>
>
> I will be very happy if anybody can give a hint how to resolve this
> problem.
>
>
> My best wishes,
>
> Oliver Betz
>
>
>
>
>
>
>
>
>
>
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>


-- 

Marguerite A. Butler
Professor

Department of Biology
2538 McCarthy Mall, Edmondson Hall 216
Honolulu, HI 96822

Office: 808-956-4713
Dept: 808-956-8617
Lab:  808-956-5867
FAX:   808-956-4745
http://butlerlab.org
http://manoa.hawaii.edu/biology/people/marguerite-butler
http://www2.hawaii.edu/~mbutler

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] If my trait X cannot be regressed by body size, how can I rescue residuals corrected by the phylogeny and SE?

2021-05-28 Thread Marguerite Butler
iling list - R-sig-phylo@r-project.org
> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> > Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>


-- 

Marguerite A. Butler
Professor

Department of Biology
2538 McCarthy Mall, Edmondson Hall 216
Honolulu, HI 96822

Office: 808-956-4713
Dept: 808-956-8617
Lab:  808-956-5867
FAX:   808-956-4745
http://butlerlab.org
http://manoa.hawaii.edu/biology/people/marguerite-butler
http://www2.hawaii.edu/~mbutler

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] How to sort trait data according to tree

2021-05-17 Thread Marguerite Butler
Emmanuel,
Yes, that was exactly what I was confused about (even str(cbm) wasnʻt very
illuminating). Thank you so much for your explanation. CorBrownian is only
storing the formula and the tree, and the data are gathered by gls. I agree
this is a very good way to keep everything in its proper order.

Thanks again for your kind replies,
Marguerite

On Sun, May 16, 2021 at 8:58 PM Emmanuel Paradis 
wrote:

> - Le 17 Mai 21, à 13:41, Marguerite Butler  a
> écrit :
>
> Thank you very much for the reply Emmanuel!
>
> OK, yes I just tried and Iʻm surprised that this works:
>
> >  dat <- data.frame(Species = c("Homo", "Pongo", "Macaca", "Ateles",
> "Galago"), X = X, Y = Y)
> > cbm <- corBrownian(1, tree, form = ~Species)
> > gls(Y ~ X, dat, correlation = cbm)
>
> How does corBrownian know about Species, when it is only defined as an
> element of dat, and not known in the global environment?
>
> corBrownian() doesn't know where all these variables are located. Indeed,
> if you print 'cbm':
>
> R> cbm
> Uninitialized correlation structure of class corBrownian
>
> It's gls() that looks for them. When you think about it, using "data =
> ..." is really a good way to be sure that all variables are ordered
> correctly whether they are involved as predictors (in 'model = '), in the
> correlation structure, or in the variance function ('weights = ').
>
> Best,
>
> Emmanuem
>
> Anyway, thank you!
> Marguerite
>
>
> On Sun, May 16, 2021 at 7:03 PM Emmanuel Paradis 
> wrote:
>
>> Hi Marguerite,
>>
>> The issue is about nlme::gls. The help page ?gls says:
>>
>> data: an optional data frame containing the variables named in
>>   ‘model’, ‘correlation’, ‘weights’, and ‘subset’. By default
>>   the variables are taken from the environment from which ‘gls’
>>   is called.
>>
>> So you make it work by removing "dat$" in the call to corBrownian():
>>
>> cbm <- corBrownian(1, tree, form = ~Species)
>>
>> Best,
>>
>> Emmanuel
>>
>> - Le 17 Mai 21, à 6:22, Marguerite Butler mbutler...@gmail.com a
>> écrit :
>>
>> > Aloha all,
>> >
>> > Does anyone know the answer to this question -  how does ape pass the
>> > species name information into gls? Seems to be a global environment
>> issue
>> > when using apeʻs corBrownian and  nlmeʻs gls? and a very specific form
>> is
>> > required, but why?
>> >
>> > Modified from ape help page for CorClasses:
>> >
>> > ---
>> >
>> > library(nlme)
>> > library(ape)
>> >
>> > tree <- read.tree(text =
>> >
>> "Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);")
>> > X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968)
>> > Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259)
>> > dat <- data.frame(Species = c("Homo", "Pongo", "Macaca", "Ateles",
>> > "Galago"), X = X, Y = Y)
>> >
>> > gls(Y ~ X, dat, correlation=corBrownian(1, tree.primates, form =
>> ~Species))
>> > ---
>> >
>> > The above works just fine. However, if we try to calculate corBrownian
>> > separately, gls cannot understand the corBrownian object:
>> > ---
>> >> cbm <- corBrownian(1, tree, form = ~dat$Species)# calculate
>> correlation
>> >> separately
>> >> gls(Y ~ X, dat, correlation = cbm)   ## does not work, invalid
>> type
>> >> (list) for dat
>> > Error in model.frame.default(formula = ~dat + Species + Y + X, data =
>> list( :
>> >  invalid type (list) for variable 'dat'
>> > ---
>> >
>> > If we put the species information into a vector in the global
>> > environment it works. Why canʻt we use dat$Species?
>> > ---
>> > spp <- dat$Species   # save species as vector in global environment
>> > cbm <- corBrownian(1, tree, form = ~spp)# calculate correlation
>> separately
>> > gls(Y ~ X, dat, correlation = cbm)   ## works!
>> >
>> > Thanks,
>> >
>> > Marguerite
>> >
>> >
>> >
>> > On Fri, May 14, 2021 at 10:21 PM Marguerite Butler <
>> mbutler...@gmail.com>
>> > wrote:
>> >
>> >> Aloha Oliver,
>> >>
>> >> From the cor.Brownian help page, the explanation for the form argument
>&

Re: [R-sig-phylo] How to sort trait data according to tree

2021-05-17 Thread Marguerite Butler
Thank you very much for the reply Emmanuel!

OK, yes I just tried and Iʻm surprised that this works:

>  dat <- data.frame(Species = c("Homo", "Pongo", "Macaca", "Ateles",
"Galago"), X = X, Y = Y)
> cbm <- corBrownian(1, tree, form = ~Species)
> gls(Y ~ X, dat, correlation = cbm)

How does corBrownian know about Species, when it is only defined as an
element of dat, and not known in the global environment?

Anyway, thank you!
Marguerite


On Sun, May 16, 2021 at 7:03 PM Emmanuel Paradis 
wrote:

> Hi Marguerite,
>
> The issue is about nlme::gls. The help page ?gls says:
>
> data: an optional data frame containing the variables named in
>   ‘model’, ‘correlation’, ‘weights’, and ‘subset’. By default
>   the variables are taken from the environment from which ‘gls’
>   is called.
>
> So you make it work by removing "dat$" in the call to corBrownian():
>
> cbm <- corBrownian(1, tree, form = ~Species)
>
> Best,
>
> Emmanuel
>
> - Le 17 Mai 21, à 6:22, Marguerite Butler mbutler...@gmail.com a
> écrit :
>
> > Aloha all,
> >
> > Does anyone know the answer to this question -  how does ape pass the
> > species name information into gls? Seems to be a global environment issue
> > when using apeʻs corBrownian and  nlmeʻs gls? and a very specific form is
> > required, but why?
> >
> > Modified from ape help page for CorClasses:
> >
> > ---
> >
> > library(nlme)
> > library(ape)
> >
> > tree <- read.tree(text =
> >
> "Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);")
> > X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968)
> > Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259)
> > dat <- data.frame(Species = c("Homo", "Pongo", "Macaca", "Ateles",
> > "Galago"), X = X, Y = Y)
> >
> > gls(Y ~ X, dat, correlation=corBrownian(1, tree.primates, form =
> ~Species))
> > ---
> >
> > The above works just fine. However, if we try to calculate corBrownian
> > separately, gls cannot understand the corBrownian object:
> > ---
> >> cbm <- corBrownian(1, tree, form = ~dat$Species)# calculate
> correlation
> >> separately
> >> gls(Y ~ X, dat, correlation = cbm)   ## does not work, invalid
> type
> >> (list) for dat
> > Error in model.frame.default(formula = ~dat + Species + Y + X, data =
> list( :
> >  invalid type (list) for variable 'dat'
> > ---
> >
> > If we put the species information into a vector in the global
> > environment it works. Why canʻt we use dat$Species?
> > ---
> > spp <- dat$Species   # save species as vector in global environment
> > cbm <- corBrownian(1, tree, form = ~spp)# calculate correlation
> separately
> > gls(Y ~ X, dat, correlation = cbm)   ## works!
> >
> > Thanks,
> >
> > Marguerite
> >
> >
> >
> > On Fri, May 14, 2021 at 10:21 PM Marguerite Butler  >
> > wrote:
> >
> >> Aloha Oliver,
> >>
> >> From the cor.Brownian help page, the explanation for the form argument
> is
> >> this:
> >>
> >> a one sided formula of the form ~ t, or ~ t | g, specifying the taxa
> >> covariate t and, optionally, a grouping factor g. A covariate for this
> >> correlation structure must be character valued, with entries matching
> the
> >> tip labels in the phylogenetic tree. When a grouping factor is present
> in
> >> form, the correlation structure is assumed to apply only to observations
> >> within the same grouping level; observations with different grouping
> levels
> >> are assumed to be uncorrelated. Defaults to ~ 1, which corresponds to
> using
> >> the order of the observations in the data as a covariate, and no groups.
> >>
> >> This means that "t" should contain the species names or tip labels,
> >> spelled exactly as they are in your phylogeny.  So for example if your
> data
> >> SteninaeData
> >> has a column for species names called "species" then you would specify:
> >>
> >> cor.BM <- corBrownian(phy=SteninaeTree, form = ~dat$species)
> >>
> >> and then proceed with your regression model as above.
> >>
> >>
> >> You can also follow the example on the corClasses page:
> >> http://127.0.0.1:27386/library/ape/html/corClasses.html
> >>
> >> Slightly modified here:
> >>
&

Re: [R-sig-phylo] How to sort trait data according to tree

2021-05-16 Thread Marguerite Butler
Aloha all,

Does anyone know the answer to this question -  how does ape pass the
species name information into gls? Seems to be a global environment issue
when using apeʻs corBrownian and  nlmeʻs gls? and a very specific form is
required, but why?

Modified from ape help page for CorClasses:

---

library(nlme)
library(ape)

tree <- read.tree(text =
"Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);")
X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968)
Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259)
dat <- data.frame(Species = c("Homo", "Pongo", "Macaca", "Ateles",
"Galago"), X = X, Y = Y)

gls(Y ~ X, dat, correlation=corBrownian(1, tree.primates, form = ~Species))
---

The above works just fine. However, if we try to calculate corBrownian
separately, gls cannot understand the corBrownian object:
---
> cbm <- corBrownian(1, tree, form = ~dat$Species)# calculate correlation 
> separately
> gls(Y ~ X, dat, correlation = cbm)   ## does not work, invalid type 
> (list) for dat
Error in model.frame.default(formula = ~dat + Species + Y + X, data = list( :
  invalid type (list) for variable 'dat'
---

If we put the species information into a vector in the global
environment it works. Why canʻt we use dat$Species?
---
spp <- dat$Species   # save species as vector in global environment
cbm <- corBrownian(1, tree, form = ~spp)# calculate correlation separately
gls(Y ~ X, dat, correlation = cbm)   ## works!

Thanks,

Marguerite



On Fri, May 14, 2021 at 10:21 PM Marguerite Butler 
wrote:

> Aloha Oliver,
>
> From the cor.Brownian help page, the explanation for the form argument is
> this:
>
> a one sided formula of the form ~ t, or ~ t | g, specifying the taxa
> covariate t and, optionally, a grouping factor g. A covariate for this
> correlation structure must be character valued, with entries matching the
> tip labels in the phylogenetic tree. When a grouping factor is present in
> form, the correlation structure is assumed to apply only to observations
> within the same grouping level; observations with different grouping levels
> are assumed to be uncorrelated. Defaults to ~ 1, which corresponds to using
> the order of the observations in the data as a covariate, and no groups.
>
> This means that "t" should contain the species names or tip labels,
> spelled exactly as they are in your phylogeny.  So for example if your data 
> SteninaeData
> has a column for species names called "species" then you would specify:
>
> cor.BM <- corBrownian(phy=SteninaeTree, form = ~dat$species)
>
> and then proceed with your regression model as above.
>
>
> You can also follow the example on the corClasses page:
> http://127.0.0.1:27386/library/ape/html/corClasses.html
>
> Slightly modified here:
>
> ---
>
> library(nlme)
>
> library(ape)
>
> tree <-
> read.tree(text="Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);"
>
> X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968)  # a vector
>
> Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259)   # a vector
>
> Species <- c("Homo", "Pongo", "Macaca", "Ateles", "Galago")  # a vector
>
> dat <- data.frame(Species = Species, X = X, Y = Y)# make a dataframe
> with Species, X, Y, can also read in .csv
> cbm <- corBrownian(1, tree, form = ~dat$Species)
>
> m1 <- gls(Y ~ X, dat, correlation=cbm)
>
> m1
>
> ---
>
> hth,
>
> Marguerite
>
> On Fri, May 14, 2021 at 9:46 PM Oliver Betz 
> wrote:
>
>> Dear all:
>>
>> I am currently facing the following problem:
>>
>> Unsing ape, nlme por phylolm for PGLS I was wondering how to sort my
>> trait data (the ones in my csv file) according to the tree, so that
>> both are in the same order. I learned how to check whether the data &
>> the tree coincide in both number of species and names, but I do not
>> nderstand how to perform the mentioned sorting task.
>>
>> Once I am running the PGLS commands such as
>> cor.BM <- corBrownian(phy=SteninaeTree)
>> pgls <- gls(LOG_rel_Attachment_force_smooth ~
>> LOG_rel_Anzahl_Hafthaare_4.Vordertarsus, correlation = cor.BM, data =
>> SteninaeData, method = "ML")"
>>
>> I regularly receive the follwing warning note:
>> "In Initialize.corPhyl(X[[i]], ...) :
>> No covariate specified, species will be taken as ordered in the data
>> frame. To avoid this message, specify a covariate containing the
>> species names with the 'form' argument."
>>
>> Since this is just a 

Re: [R-sig-phylo] How to sort trait data according to tree

2021-05-15 Thread Marguerite Butler
Aloha Oliver,

>From the cor.Brownian help page, the explanation for the form argument is
this:

a one sided formula of the form ~ t, or ~ t | g, specifying the taxa
covariate t and, optionally, a grouping factor g. A covariate for this
correlation structure must be character valued, with entries matching the
tip labels in the phylogenetic tree. When a grouping factor is present in
form, the correlation structure is assumed to apply only to observations
within the same grouping level; observations with different grouping levels
are assumed to be uncorrelated. Defaults to ~ 1, which corresponds to using
the order of the observations in the data as a covariate, and no groups.

This means that "t" should contain the species names or tip labels, spelled
exactly as they are in your phylogeny.  So for example if your data
SteninaeData
has a column for species names called "species" then you would specify:

cor.BM <- corBrownian(phy=SteninaeTree, form = ~dat$species)

and then proceed with your regression model as above.


You can also follow the example on the corClasses page:
http://127.0.0.1:27386/library/ape/html/corClasses.html

Slightly modified here:

---

library(nlme)

library(ape)

tree <-
read.tree(text="Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);"

X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968)  # a vector

Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259)   # a vector

Species <- c("Homo", "Pongo", "Macaca", "Ateles", "Galago")  # a vector

dat <- data.frame(Species = Species, X = X, Y = Y)# make a dataframe
with Species, X, Y, can also read in .csv
cbm <- corBrownian(1, tree, form = ~dat$Species)

m1 <- gls(Y ~ X, dat, correlation=cbm)

m1

---

hth,

Marguerite

On Fri, May 14, 2021 at 9:46 PM Oliver Betz 
wrote:

> Dear all:
>
> I am currently facing the following problem:
>
> Unsing ape, nlme por phylolm for PGLS I was wondering how to sort my
> trait data (the ones in my csv file) according to the tree, so that
> both are in the same order. I learned how to check whether the data &
> the tree coincide in both number of species and names, but I do not
> nderstand how to perform the mentioned sorting task.
>
> Once I am running the PGLS commands such as
> cor.BM <- corBrownian(phy=SteninaeTree)
> pgls <- gls(LOG_rel_Attachment_force_smooth ~
> LOG_rel_Anzahl_Hafthaare_4.Vordertarsus, correlation = cor.BM, data =
> SteninaeData, method = "ML")"
>
> I regularly receive the follwing warning note:
> "In Initialize.corPhyl(X[[i]], ...) :
> No covariate specified, species will be taken as ordered in the data
> frame. To avoid this message, specify a covariate containing the
> species names with the 'form' argument."
>
> Since this is just a warning message, I usually was going on with the
> "summary(pgls)" command, where I was getting my results.
>
>
> Anyhow, I was wondering whether it is OK to just ignore this warning
> or whether it means that I will have to sort the sequence of the
> sepcies trait data in my data file (csv) exactly according to the
> sequence of the species as they occur at the tips of my tree file.
> Since the spelling of the species names is exactly the same in both
> the tree file and the data file, I thought that such sorting is
> automatically done by the algorithm?
>
> I will be glad if you have any hint how ro resove this.
>
> My best wishes,
>
> Oliver Betz
> (University of Tübingen, Gernmany)
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>


-- 
________
Marguerite A. Butler
Professor

Department of Biology
2538 McCarthy Mall, Edmondson Hall 216
Honolulu, HI 96822

Office: 808-956-4713
Dept: 808-956-8617
Lab:  808-956-5867
FAX:   808-956-4745
http://butlerlab.org
http://manoa.hawaii.edu/biology/people/marguerite-butler
http://www2.hawaii.edu/~mbutler

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] units of sigsq

2021-03-21 Thread Marguerite Butler
Aloha all,

Well if all youʻre doing is plotting data, then sure, you can have a change
of axes and something in g will still be in g with log-transformed axes.

But does this apply to fitting models of BM or OU? The model is assuming
that the "errors" or the small jumps in phenotype comes from a normal
distribution.  So by fitting a model on log-transformed data will assume
that those changes come from a different distribution, on log(g). This has
potentially deep implications, suggesting that the magnitude of the changes
in the original scale would be larger for large values of g that for small
values of g when on a log-scale. That would imply basically that itʻs
easier to get a lot larger or a lot smaller in a single step if youʻre
already big. This might make sense especially for genome size evolution,
for example, where big changes in size arise by chromosomal duplication or
transposable element activity, etc. Anyway, log-transformation is commonly
applied to biological data.

As for the units of time, yes I agree with Joe - it is the units that the
depth of the tree is in. If it is time calibrated, then it is time.
Otherwise it is generally mutations per branch length.

Marguerite



On Fri, Mar 19, 2021 at 2:20 PM Joe Felsenstein 
wrote:

> Marguerite asked:
>
> > First - Joe - what do you mean by log(grams) has no units? The units of
> grams is a unit, so log(mass) will have units of log-gm.  As log is not the
> same as 1/gm, log(gm) cannot be unit-free.
>
> I looked it up on Wikipedia, and was assured by it that Marguerite is
> right, log(weight) has the same units as weight, except you're
> supposed to call them log-gm.
>
>
> I do think that unless there is a calibration with time, the tree
> branch lengths are not in units of time but in units of base
> substitutions per site.
>
> And I remain confused on what the units are, if you compute a linear
> combination such as  2 log(wt) - 3 log(height). Which princip[al, le]
> components machinery does.
>
> Joe
> --
> Joe Felsenstein felse...@gmail.com,  j...@gs.washington.edu
>  Department of Genome Sciences and Department of Biology,
>  University of Washington, Box 355065, Seattle, WA 98195-5065 USA
>


-- 

Marguerite A. Butler
Professor

Department of Biology
2538 McCarthy Mall, Edmondson Hall 216
Honolulu, HI 96822

Office: 808-956-4713
Dept: 808-956-8617
Lab:  808-956-5867
FAX:   808-956-4745
http://butlerlab.org
http://manoa.hawaii.edu/biology/people/marguerite-butler
http://www2.hawaii.edu/~mbutler

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] units of sigsq

2021-03-19 Thread Marguerite Butler
Aloha all,

Iʻm still reeling from the Atlanta murders and the rise of hate in general,
so I may not be thinking straight, but if weʻre talking about Brownian
motion, Iʻm not sure this is quite right.

If the trait is  log(grams)  then the trait is unit-free.
>

First - Joe - what do you mean by log(grams) has no units? The units of
grams is a unit, so log(mass) will have units of log-gm.  As log is not the
same as 1/gm, log(gm) cannot be unit-free.


> The "time"
> is probably branch length from a phylogeny.  That in turn (from DNA
> data) is usually   DNA substitutions per site.
>
> So the units of the standard deviation aresites per substitution.
> But this is not the standard deviation, it is its square.
>
> So (wait for it ...)square sites per square substitution
>

We worked out the units of BM and OU parameters in Cressler et al (2015).
We parameterized it this way. If we have a trait X evolving under a
Brownian motion, we can write the change in X per unit time as:

dX(t) = sigma*dB(t)

Where dX(t) (the change in X) will have the same units as X (in this case
log-gm).
 and dB or a draw from the white noise distribution has units of time to
the 1/2 power.

Therefore sigma must have units of trait*time^(-1/2), and sigma^2 would
have units of trait^2/time, or we will not obtain the correct units for X.

Ted - yes, I agree with Joe - the matrix version of sigma and sigma^2 on a
element-by-element basis will have the same units as the univariate case,
except that you will have to substitute the units of each trait, as
appropriate, if they are measured in different units.

Marguerite


> So (wait for it ...)square sites per square substitution
>
> Now that is pretty weird.  But years ago people pointed out to me that
> quantitative geneticists were accustomed to inferring variance
> components of crop yield.  The yield might be in  bushels per acre.
> So the units of its variance was:  square bushels per square acre.
> Don't even try to think about how you square a bushel, or how many
> dimensions you have to go into to square an acre.   Actually, you can
> think about them: a bushel is three-dimensional volume, and an acre is
> two dimensional area.  So crop yield has units of  meters, and
> variance of crop yield should have units of  square meters.
>
> That way lies madness ...
>
> Joe
> -
> Joe Felsenstein felse...@gmail.com,  j...@gs.washington.edu
>  Department of Genome Sciences and Department of Biology,
>  University of Washington, Box 355065, Seattle, WA 98195-5065 USA
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>


-- 

Marguerite A. Butler
Professor

Department of Biology
2538 McCarthy Mall, Edmondson Hall 216
Honolulu, HI 96822

Office: 808-956-4713
Dept: 808-956-8617
Lab:  808-956-5867
FAX:   808-956-4745
http://butlerlab.org
http://manoa.hawaii.edu/biology/people/marguerite-butler
http://www2.hawaii.edu/~mbutler

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] model averaging for discrete character evolution

2019-08-09 Thread Marguerite Butler
).
> >>
> >> Best,
> >> Jake
> >>
> >>  [[alternative HTML version deleted]]
> >>
> >> ___
> >> R-sig-phylo mailing list - R-sig-phylo@r-project.org  R-sig-phylo@r-project.org>
> >>
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylodata=02%7C01%7Cliam.revell%40umb.edu%7Ce85015cb8c314c684bdc08d71b4c8fee%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637007887976647805sdata=F8qdVoT17J%2BuC6wgvFktdI1id%2Bf37hDLe1W17z6OhTw%3Dreserved=0
> <
> https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylodata=02%7C01%7Cliam.revell%40umb.edu%7Ce85015cb8c314c684bdc08d71b4c8fee%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637007887976647805sdata=F8qdVoT17J%2BuC6wgvFktdI1id%2Bf37hDLe1W17z6OhTw%3Dreserved=0
> >
> >> Searchable archive at
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2Fdata=02%7C01%7Cliam.revell%40umb.edu%7Ce85015cb8c314c684bdc08d71b4c8fee%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637007887976657803sdata=ogCaep2DN92bk6%2FZwOOcpmHaCPQwX7BzNgKo2UMcWtw%3Dreserved=0
> <
> https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2Fdata=02%7C01%7Cliam.revell%40umb.edu%7Ce85015cb8c314c684bdc08d71b4c8fee%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637007887976657803sdata=ogCaep2DN92bk6%2FZwOOcpmHaCPQwX7BzNgKo2UMcWtw%3Dreserved=0
> >
> >>
>
>
> [[alternative HTML version deleted]]
>
> _______
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>


-- 

Marguerite A. Butler
Professor

Department of Biology
2538 McCarthy Mall, Edmondson Hall 216
Honolulu, HI 96822

Office: 808-956-4713
Dept: 808-956-8617
Lab:  808-956-5867
FAX:   808-956-4745
http://butlerlab.org
http://manoa.hawaii.edu/biology/people/marguerite-butler
http://www2.hawaii.edu/~mbutler

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] OU for non-ultrametric trees

2018-12-16 Thread Marguerite Butler
Aloha Danielle,

There is no reason at all (mathematically) why you canʻt run
non-ultrametric trees. ouch has ever been limited this way in its
functions, and it also does not require bifurcating trees.

Our original comment was that in most cases people will want to interpret
their results in units of time, which is implied by an ultra metric tree.
However, if there is some other metric that is more useful for the problem
at hand, such as units of mutational change, that is fine too. This may be
more relevant for virus evolution. The important thing is that the branch
lengths have some meaning, and users should put some thought into the
interpretation of the model output.

Marguerite

On Sun, Dec 16, 2018 at 3:09 AM Danielle Miller 
wrote:

> First, I would like to thank all for your replies.
>
> Second, I have a tree constructed of many viral genome sequences. I use
> midpoint rooting in order to run the model on a rooted tree (not using
> outgroup) and do not have three ways splits.
> I’ve probably read this in a paper mentioning that in order to fit OU one
> needs an ultrametric tree (pre 2014).
>
> Looking at the models equations I did not fully understand way this is a
> valid assumption, and wanted to make sure I’m not missing anything
> important.
>
> Once again, many thanks,
> Danelle
>
> > On 12 Dec 2018, at 23:18, David Bapst  wrote:
> >
> > Danielle,
> >
> > Apologies for the late comment on this, but what do you mean by
> non-rooted? Do you mean the tree is rooted, but has a three-way split with
> the outgroup at the root node? You shouldn't do OU on a tree you can't
> assign a root to.
> >
> > I'm also just a bit curious what gave you the sense that OU was
> inappropriate for non-ultrametric trees. To my knowledge, this was only an
> issue in one case, as highlighted by Slater (2014), that one particularly
> (popular) algorithm for fitting evolutionary models doesn't work for OU on
> non-ultrametric trees. I don't think this is as much of a concern in 2018,
> as I think most/all packages that were doing such have fixed their code so
> an alternative algorithm is used when a non-ultrametric tree is given (as
> with a tree containing fossil taxa).
> >
> > Cheers,
> > -Dave
> >
> >
> > On Thu, Dec 6, 2018 at 6:35 AM Danielle Miller  <mailto:danimiller...@gmail.com>> wrote:
> > Hi all,
> >
> > I have a non-rooted non-ultrametric tree and a corresponding set of a
> single trait values for each one of the tips.
> > I’m interested whether it follows a BM model or an OU.
> >
> > Reading previous comments in the archive, I understood that running an
> OU process is inadequate in a case of non-ultrametric trees, however I did
> not fully understand why.
> > As I use ouch package, what are the consequences of running an unrooted
> tree? And a non-ultrametric one?
> >
> > What would be the best way assessing this issue? Diversitree should be
> more reliable?
> >
> > I will appreciate any explanation and help,
> > Danielle Miller
> > ___
> > R-sig-phylo mailing list - R-sig-phylo@r-project.org  R-sig-phylo@r-project.org>
> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo <
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
> > Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/ <
> http://www.mail-archive.com/r-sig-phylo@r-project.org/>
> >
> >
> > --
> > David W. Bapst, PhD
> > Asst Research Professor, Geology & Geophysics, Texas A & M University
> > Postdoc, Ecology & Evolutionary Biology, Univ of Tenn Knoxville
> > https://github.com/dwbapst/paleotree <
> https://github.com/dwbapst/paleotree>
>
>
> [[alternative HTML version deleted]]
>
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>


-- 

Marguerite A. Butler
Professor

Department of Biology
2538 McCarthy Mall, Edmondson Hall 216
Honolulu, HI 96822

Office: 808-956-4713
Dept: 808-956-8617
Lab:  808-956-5867
FAX:   808-956-4745
http://butlerlab.org
http://manoa.hawaii.edu/biology/people/marguerite-butler
http://www2.hawaii.edu/~mbutler

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] estimate ancestral state with OUwie models

2018-06-12 Thread Marguerite Butler
Aloha all,

There is no requirement for an ultra metric tree in the formulae reported in 
Butler-King 2004. Interested investigators should in particular read the 
supplementary materials where the mathematical details are worked out. 

We do generally use ultrametric trees because as comparative biologists, it is 
more straightforward to think about evolution in units of time rather than in 
terms of mutational units, etc. However this is by choice, not any 
methodological limitation. 

Once the model parameters are found, the phylogenetic variance-covariance 
matrix defined by the alpha, thetas, and sigmas can be used to compute 
ancestral states using a weighted least squares reconstruction method (instead 
of the typical BM var-cov matrix). The mapping of the alphas, thetas, and 
sigmas onto the tree are incorporated into this V-COV matrix, so that accounts 
for the OU model. 

NOTES: 
1) without knowing why you are doing this, I do feel compelled to warn you that 
it is unclear why one would want to estimate ancestral states for 
poorly-fitting models. Be careful!

2) I hope you realize that ancestral states are in general poorly estimated, 
even assuming the “correct” model. This is because there is less and less 
information to anchor the values as you get farther from the tips, similar to 
the root estimation problem described below.  This issue was clearly exposed in 
Schluter et al 1997 (and less famously so in Butler and Losos 1997). These 
depressing results were among the motivations for developing model-fit 
approaches in the first place. 

3) In 2008/2009 the algorithms in OUCH, SLOUCH, and possibly other methods have 
changed in the estimation of the value of the root state (X0) which is an 
internal calculation in fitting the model.  Ho and Ane 2013, and Hansen et al 
2008 both reported that the root state X0 is ill-defined (unless there are 
fossil data to anchor the value). This makes sense intuitively, as all of the 
information is from the tips, and the root is very far down the tree. A 
reasonable assumption is that it is distributed according to the stationary 
distribution of the OU process (X0 ~ N(theta(0), sigma^2/2*alpha) and this 
assumption is what these methods now employ. 

4) Whatever you end up doing, do check for the robustness of your results with 
parametric bootstrap on your fitted models (a la Boettinger et al 2012). As 
many investigators have reported, these parameters can have large confidence 
intervals, and can covary with one another (being on a likelihood ridge, etc.). 
But do note that even when parameters may not be uniquely identifiable, it may 
still be possible to have robust model selection (see Cressler et al 2015).  So 
perhaps you want to fit ancestral states to see if the different models give 
you the same states? IDK?

So in short, yes, you can do it, with any number of methods. But why? If you 
can answer your biological question with methods that do not involve estimation 
of a parameter that is inherently fraught with error, it might be better to go 
another way. Bottom line - use caution and be thoughtful!

I am sure if I have made any errors Aaron, Clay, or Thomas will help.

Hope this helps

Marguerite

Schluter, D., T. Price, A. O. Mooers, and D. Ludwig. 1997. Likelihood of 
ancestor states in adaptive radiation. Evolution 51:1699–1711.

Butler, M. A., and J. B. Losos. 1997. Testing for unequal amounts of evolution 
in a continuous character on dif- ferent branches of a phylogenetic tree using 
linear and squared-change parsimony: an example using Lesser Antillean Anolis 
lizards. Evolution 51:1623–1635. 

Hansen T.F., Pienaar J., Orzack S.H. 2008. A comparative method for studying 
adaptation to a randomly evolving environment. Evolution 62:1965–1977.

Ho L.S.T., Ané C.. 2014. Intrinsic inference difficulties for trait evolution 
with Ornstein-Uhlenbeck models. Methods Ecol. Evol. 2:1133–1146.

Cressler C., Butler M.A., and King A. A. (2015) Detecting adaptive evolution in 
phylogenetic comparative analysis using the Ornstein-Uhlenbeck model.  Sys. 
Bio. 64(6):953-968. DOI: 10.1093/sysbio/syv043

Boettiger C., Coop G., Ralph P. 2012. Is your phylogeny informative? Measuring 
the power of comparative methods. Evolution 66: 2240–2251. 


Marguerite A. Butler
Professor

Department of Biology 
2538 McCarthy Mall, Edmondson Hall 216 
Honolulu, HI 96822

Office: 808-956-4713
Dept: 808-956-8617
Lab:  808-956-5867
FAX:   808-956-4745
http://butlerlab.org
http://manoa.hawaii.edu/biology/people/marguerite-butler
http://www2.hawaii.edu/~mbutler


> On Jun 11, 2018, at 7:33 PM, Simone Blomberg  wrote:
> 
> This sounded wrong to me, as the OU process should be agnostic to the 
> dataset: There are no restrictions inherent in the OU process that apply 
> particularly to phylogenetic data, whether the tree is ultrametric or not. I 
> re-read Slater 2014 and it is clear that you can

Re: [R-sig-phylo] Interpretation of standard errors of parameter estimates in OUwie models

2018-04-04 Thread Marguerite Butler
h errors it limit the biological inferences I
> >> can make? I'm more interested in the relative thetas across regimes than on
> >> the exact values (testing the prediction that birds in darker habitats tend
> >> to adapt to darker plumages than birds in more illuminated habitats).
> >>
> >> I have attached a table averaging parameter estimates and errors from
> >> models fitted across a posterior distribution of 100 simmaps for four
> >> traits; and one exemplar fitted model from one trait in one of those
> >> simmaps.
> >>
> >> Thanks a lot for any help,
> >>
> >>
> >> *--*
> >> *Rafael Sobral Marcondes*
> >> PhD Candidate (Systematics, Ecology and Evolution/Ornithology)
> >>
> >> Museum of Natural Science <http://sites01.lsu.edu/wp/mns/ 
> >> <http://sites01.lsu.edu/wp/mns/>>
> >> Louisiana State University
> >> 119 Foster Hall
> >> Baton Rouge, LA 70803, USA
> >>
> >> Twitter: @brown_birds <https://twitter.com/brown_birds 
> >> <https://twitter.com/brown_birds>>
> >>
> >>
> >> ___
> >> R-sig-phylo mailing list - R-sig-phylo@r-project.org 
> >> <mailto:R-sig-phylo@r-project.org>
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo 
> >> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
> >> Searchable archive at http://www.mail-archive.com/r- 
> >> <http://www.mail-archive.com/r->
> >> sig-ph...@r-project.org/ <http://sig-ph...@r-project.org/>
> >>
> >>
> >
> >
> > --
> > William Gearty
> > PhD Candidate, Paleobiology
> > Department of Geological Sciences
> > Stanford School of Earth, Energy & Environmental Sciences
> > williamgearty.com <http://williamgearty.com/>
> >
> >   [[alternative HTML version deleted]]
> >
> > ___
> > R-sig-phylo mailing list - R-sig-phylo@r-project.org 
> > <mailto:R-sig-phylo@r-project.org>
> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo 
> > <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
> > Searchable archive at 
> > http://www.mail-archive.com/r-sig-phylo@r-project.org/ 
> > <http://www.mail-archive.com/r-sig-phylo@r-project.org/>
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org 
> <mailto:R-sig-phylo@r-project.org>
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo 
> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ 
> <http://www.mail-archive.com/r-sig-phylo@r-project.org/>
> 
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org 
> <mailto:R-sig-phylo@r-project.org>
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo 
> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ 
> <http://www.mail-archive.com/r-sig-phylo@r-project.org/>

Marguerite A. Butler
Associate Professor

Department of Biology 
2538 McCarthy Mall, Edmondson Hall 216 
Honolulu, HI 96822

Office: 808-956-4713
Dept: 808-956-8617
Lab:  808-956-5867
FAX:   808-956-4745
http://butlerlab.org
http://manoa.hawaii.edu/biology/people/marguerite-butler
http://www2.hawaii.edu/~mbutler















[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] compar.ou

2013-10-29 Thread Marguerite Butler
Hi Sandra and others,

You can also assess confidence using parametric bootstrap, a procedure which we 
generally recommend for all users. ouch has built-in facilities to do so  (the 
bootstrap() and simulate() functions in addition to update() ).  I think there 
are examples in my tutorial. If not, let me know and I can send you another. 

Marguerite

On Oct 29, 2013, at 6:08 AM, Cecile Ane a...@stat.wisc.edu wrote:

 FYI, we have some theory to explain why alpha has large standard errors and 
 in which conditions. As Brian says, it comes with a flat likelihood with 
 respect with alpha.
 http://dx.doi.org/10.1214/13-AOS1105 or 
 http://www.stat.wisc.edu/~ane/publis/2013HoAne_AoS.pdf
 
 On 10/29/2013 09:46 AM, Brian O'Meara wrote:
 In at least the OUwie paper we spent a lot of time doing simulations to
 determine this empirically (this may have been examined in other papers,
 too, though none come to mind). Alpha can be estimated, but sometimes with
 scarily large standard errors (but not always). This property should hold
 for related methods (it's a property of the likelihood surface for these
 models). You can just plot the alpha values vs likelihood to get a sense of
 this surface for your dataset (ideally, estimating the other parameters for
 each fixed alpha, which is possible, though the kludge of holding the other
 parameters at their MLE and just varying alpha will give you a sense of the
 surface, though it's a bit non-conservative). We had code in OUwie to do a
 contour plot of the likelihood surface but took it out (a small licensing
 difference between akima (university license) and OUwie (GPL) led to OUwie
 being pulled from CRAN by the CRAN maintainers until we resolved the
 difference).
 
 Brian
 
 
 ___
 Brian O'Meara
 Assistant Professor
 Dept. of Ecology  Evolutionary Biology
 U. of Tennessee, Knoxville
 http://www.brianomeara.info
 
 Students wanted: Applications due Dec. 15, annually
 Postdoc collaborators wanted: Check NIMBioS' website
 Calendar: http://www.brianomeara.info/calendars/omeara
 
 
 On Tue, Oct 29, 2013 at 10:21 AM, sandra goutte gou...@mnhn.fr wrote:
 
 Thank you Marguerite. Looking at OUwie and OUCH/SLOUCH, i see that alpha is
 estimated along the other parameters, whereas in Hansen 1997 and other
 papers it is suggested that this would lead to very large standard errors.
 Is that problem resolved in these functions?
 
 Best,
 Sandra.
 
 
 2013/10/26 Marguerite Butler mbutler...@gmail.com
 
 Dear Sandra,
 
 You might also want to look at the papers that go along with slouch,
 ouch,
 and ouwie. Here are some pdfs, along with some tutorials.
 
 On my Rcompstart, the ouch stuff starts around page 165
 
 Let me know if you need help with ouch.
 
 Marguerite
 
 
 
 On Oct 24, 2013, at 7:03 AM, sandra goutte gou...@mnhn.fr wrote:
 
 Dear list,
 
 My aim is to compare the fit of models for which *theta* is allowed to
 change at different nodes (different combinations of 1 ,2 or 3 nodes). I
 don't really understand the calculation of the deviance, but if i'm not
 mistaken the difference between the deviances of 2 models follows a
 *chi*square distribution with the difference of parameters in the
 models as
 degree of freedom. Now, how to compare two models with the same number of
 optimum changes but at different nodes?
 
 I am having a hard time understanding the details of the function with
 regards to Hansen's (1997) paper. What is the relationship between the
 deviance and the RSS from Hansen (1997)? Is it possible to calculate the
 RSS from the output of compar.ou ?
 
 I would appreciate any advice or insight on the subject!
 Sandra.
 
 --
 PhD Student
 Muséum National d'Histoire Naturelle
 Département Systématique et Évolution
 USM 601 / UMR 7205 Origine, Structure et Évolution de la Biodiversité
 Reptiles  Amphibiens - Case Postale 30
 25 rue Cuvier
 F-75005 Paris
 
 Tel :  +33 (0) 1 40 79 34 90
 Mobile: +33 (0) 6 79 20 29 99
 
 [[alternative HTML version deleted]]
 
 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at
 http://www.mail-archive.com/r-sig-phylo@r-project.org/
 
 
  
  Marguerite A. Butler
  Associate Professor
 
  Department of Biology
 University of Hawaii
 2538 McCarthy Mall, Edmondson Hall 216
 Honolulu HI 96822
 
 Office: 808-956-4713
 Dept: 808-956-8617
  Lab:  808-956-5867
 FAX:   808-956-9812
  http://www.hawaii.edu/zoology/faculty/butler.html
 http://www2.hawaii.edu/~mbutler
  http://www.hawaii.edu/zoology/
 
 
 -- 
 Cecile Ane
 Departments of Statistics and of Botany
 University of Wisconsin - Madison
 www.stat.wisc.edu/~ane/
 
 CALS statistical consulting lab:
 www.cals.wisc.edu/calslab/stat_consulting.php
 
 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman

Re: [R-sig-phylo] Question on ace ML reconstruction of discrete binary character

2013-07-30 Thread Marguerite Butler
Oops. Sorry the citation is Schluter, Price, Mooers, Ludwig 1997. Likelihood of 
ancestor states in adaptive radiation. Evolution 51:1699-1711.  This issue has 
been known for a long time. 


On Jul 30, 2013, at 6:51 AM, Marguerite Butler mbutler...@gmail.com wrote:

 Hi Tom,
 
 One thing to keep in mind is the information content of the data relative to 
 what you are trying to infer. Basically, you have data only at the tips, but 
 are trying to infer the state of the root deep in the tree. So therefore 
 there is actually very little information being brought to bear on this 
 problem. In this case, whatever answer you get will very strongly reflect the 
 assumptions of the model that you apply and the structure of the tree. Put 
 another way, if you were to construct error bars around this character state 
 estimate, you would see that they are huge (See Moers et al. 1997 in 
 Evolution).   
 
 It sounds like you are expecting a linear parsimony reconstruction. Why not 
 just use that? Your character does not change very much on the tree. This is 
 basically what your ML answer is telling you anyway, more than 50% chance of 
 red at the base.  
 
 Marguerite
 
 On Jul 30, 2013, at 4:38 AM, Liam J. Revell liam.rev...@umb.edu wrote:
 
 Hi Tom.
 
 There is no reason to expect that the marginal ancestral state 
 reconstructions at the root node (empirical Bayesian posterior 
 probabilities) should match your tip frequencies or prior probabilities. 
 Imagine the following scenario: you have one diverse clade comprising 50% of 
 extant taxa that all diverged recently from a common ancestor share state 
 B; whereas state A is found in all the other tips of the tree, some of 
 which are in clades originating near the root. We would not expect posterior 
 probabilities at the root node to mach the empirical frequencies of our 
 state at the tips (50:50). In fact, we might expect that our reconstructed 
 state at the root of the tree would be strongly A.
 
 In your specific case, state 0 is found in three clades that originate 
 nearer to the root, whereas more nested clades are exclusively in state 1. 
 This is why - in spite of its relative rarity across the tips of the tree - 
 there is still some reasonable (PP~0.3) posterior probability under the 
 model that the root is in state 0. This is not an error that needs to be 
 corrected - it is just what your data, model, and tree tell us about the 
 ancestral node of the phylogeny.
 
 All the best, Liam
 
 Liam J. Revell, Assistant Professor of Biology
 University of Massachusetts Boston
 web: http://faculty.umb.edu/liam.revell/
 email: liam.rev...@umb.edu
 blog: http://blog.phytools.org
 
 On 7/30/2013 6:10 AM, Tom Wenseleers wrote:
 Dear all,
 Many thanks for all your advice so far. I have now moved to using rayDISC 
 in package corHMM to reconstruct marginal maximum likelihood ancestral 
 state reconstructions, using the method of Maddison et al (2007) and 
 FitzJohn et al (2009) to fix the prior probabilities at the root (setting 
 it to the observed frequency at the tips doesn't change much).
 The code I have is
 
 library(ape)
 library(corHMM)
 tree=read.tree(http://www.kuleuven.be/bio/ento/temp/tree.tre;)
 data=read.csv(file=http://www.kuleuven.be/bio/ento/temp/data.csv;)
 rownames(data)=data[,1]
 ASR=rayDISC(tree,data,ntraits=1,charnum=1,model=ER,node.states=marginal,root.p=maddfitz)
 plot(tree, cex=0.6, show.tip.label=TRUE, ljoin=2,lend=2,label.offset=0.02)
 nodelabels(pie=ASR$states,piecol=c(white,red), cex=0.45)
 tiplabels(pch = 22, bg = ifelse(data[tree$tip.label, ][,2],red,white), 
 col=black,adj = c(0.51, 0.5), cex = 0.6)
 
 I still get unusually low marginal ML values for the trait being 1 at the 
 basal nodes though (ca. 0.7, which is very low considering that 89% of my 
 species have the trait).
 Would anyone be able to offer advice on why one could get the reconstructed 
 root ML value to be so much lower than the actual observed frequency of the 
 trait at the tips, and what could be a solution to obtaining a more 
 realistic ML reconstruction? (I also tried diversitree and phangorn, but 
 they all give similar results)
 
 Cheers,
 Tom
 
 
 
 -Original Message-
 From: Jack Viljoen [mailto:javilj...@gmail.com]
 Sent: 30 July 2013 10:23
 To: Tom Wenseleers
 Subject: Re: [R-sig-phylo] Question on ace ML reconstruction of discrete 
 binary character
 
 Hello, Tom.
 
 I was just wondering if the higher uncertainty at the basal nodes isn't 
 expected, particularly given the long branches descended from them?
 
 Since this is an ML estimate and not a Bayesian one, surely the concept of 
 priors does not apply? My understanding is that ace() actually only 
 estimates the root node and that other methods are required to properly 
 estimate the states at the other nodes. I'm basing this on these posts from 
 Liam Revell earlier this year:
 http://blog.phytools.org/2013/03/conditional-scaled-likelihoods-in-ace.html
 http://blog.phytools.org

Re: [R-sig-phylo] on evolutionary models

2013-01-24 Thread Marguerite Butler
Dear Pas,

I am wondering what the general aim of your study is? 

I agree with Ted, that you should be careful about your assumptions and the 
tools you are using. Ted's point that K is a descriptive statistic is an 
important one. K describes the expected divergence among tip traits assuming a 
model of Brownian motion, in particular whether it is more or less than 
expected via BM. 

So if you find that BM doesn't fit, and then try to use K on it, what exactly 
are you hoping to understand? This is a bit like the tail wagging the dog. 

In general, I really don't like the term phylogenetic signal because BM is 
the reference model, it puts all evolutionary comparative analysis in a 
BM-centric perspective. That is, the null expectation is that trait diversity 
is produced by a BM process, and so we have to look at everything else for 
any ecological adaptation or other evolutionary influences.  In practice, we 
can find many examples where alternative models explain the data much better, 
so BM doesn't fit, but that does not mean that there is no phylogenetic 
signal (whatever that means). Evolutionary history may have had a huge 
influence, but it may be via an adaptive or other model where adaptation and 
the history of selection has also been influential. It forces artificial 
distinctions and frames of reference. 

Anyway, what are you using to assess performance? Please be sure to use 
appropriate statistics to assess model fit, and use descriptive statistics for 
description. But please do make sure that the interpretation makes sense.

Best of luck,
Marguerite


On Jan 24, 2013, at 11:03 AM, Paolo Piras paolo.pi...@uniroma3.it wrote:

 Hi pas,
 
 Besides the other arguments, I think that one should test for phylogenetic 
 signal AFTER trasforming the tree according to the best mode of evolution IF 
 it is not brownian. Maybe someone else can add some more on this.
 best
 paolo
 
 
 
 Da: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] per 
 conto di Theodore Garland Jr [theodore.garl...@ucr.edu]
 Inviato: giovedì 24 gennaio 2013 21.30
 A: pasquale.r...@libero.it; r-sig-phylo@r-project.org Mailing-list
 Oggetto: Re: [R-sig-phylo] R: Re:  From ClustalW2 Tree to Heat Map in R
 
 Hi Pas,
 
 K is just a descriptive statistic for tip data on a tree with some specified 
 branch lengths.
 Obviously, if you change the branch lengths (or the topology or the tip 
 data), then K will be different.
 How much different depends on the details.
 
 It is likely that certain branch-length transformations (whether purely 
 statistical or designed to mimic something like an OU process) will tend to 
 make K either larger or smaller, at least when checked cross a large number 
 of examples.  Somebody would need to do a lot of simulations to sort this 
 out.  Some related relevant things are here:
 
 Revell, L. J., L. J. Harmon, and D. C. Collar. 2008. Phylogenetic signal, 
 evolutionary process, and rate. Syst. Biol. 57:591-601.
 
 In the original Blomberg et al. (2003) paper we avoided this issue entirely 
 by only reporting K using whatever tree the original paper used.  We wanted 
 to be able to compare K values among traits and studies in some sort of 
 simple and fair way.
 
 If lambda and OU transforms are leading to very different K values (how 
 different?) that does not mean one K value is better than the other.  But it 
 does mean you would want to be careful if you tried to compare your K value 
 with other studies that did not use lambda and/or OU transforms.
 
 Cheers,
 Ted
 
 Theodore Garland, Jr.
 Professor
 Department of Biology
 University of California, Riverside
 Riverside, CA 92521
 Office Phone:  (951) 827-3524
 Wet Lab Phone:  (951) 827-5724
 Dry Lab Phone:  (951) 827-4026
 Home Phone:  (951) 328-0820
 Facsimile:  (951) 827-4286 = Dept. office (not confidential)
 Email:  tgarl...@ucr.edu
 http://www.biology.ucr.edu/people/faculty/Garland.html
 http://scholar.google.com/citations?hl=enuser=iSSbrhwJ
 
 Experimental Evolution: Concepts, Methods, and Applications of Selection 
 Experiments. 2009.
 Edited by Theodore Garland, Jr. and Michael R. Rose
 http://www.ucpress.edu/book.php?isbn=9780520261808
 (PDFs of chapters are available from me or from the individual authors)
 
 
 From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] 
 on behalf of pasquale.r...@libero.it [pasquale.r...@libero.it]
 Sent: Thursday, January 24, 2013 12:14 PM
 To: r-sig-phylo@r-project.org Mailing-list
 Subject: [R-sig-phylo] R: Re:  From ClustalW2 Tree to Heat Map in R
 
 Hi Gents and Lads,
 
 I have a very rapid question with a perhaps not-so-obvious reply. I'm in the
 process of testing a number of evolutionary models and estimating phylogenetic
 signal on a certain univariate data set. So something very basic and very
 simple. The point is, I found BM  performs poorly as compared to OU (single

Re: [R-sig-phylo] Ouch v.2.8-2 on the same tree gives different results (maybe)

2012-09-14 Thread Marguerite Butler
Dear Elena,

If I understand you correctly, you are trying to fit a unique regime to each 
branch of the tree? 

This is not advisable, as there are more branches than there are more branches 
than there are terminal taxa (data points). So you are trying to fit many more 
parameters than you have data points. In a hansen model, there are parameters 
for the strength of selection (alpha) the magnitude of drift (sigma) and each 
optima (4 for a 3 taxon tree). That is 6 parameters for three data points. 

It is not at all surprising that you get non convergence or different results, 
as the problem is not identifiable. We recommend using the minimum number of 
selective regimes to describe the major features of the data. Generally there 
are only a few regimes for many species, with many branches of the tree in the 
same regime.

Also you should avoid tinkering with the inner workings of S4 objects. You 
should use them via their accessors as given in the help pages. Please try the 
examples at the bottom of the included datasets:

require(ouch)
data(bimac)
?bimac

data(anolis.ssd)
?anolis.ssd

Marguerite

On Sep 14, 2012, at 5:51 AM, Elena Grassi wrote:

 Hello,
 
 I'm a newbie in phylogenetic analysis so excuse me in advance for
 incorrect terms and so on (or if my question is too silly).
 I've recently started to work on a project for which I need to use
 Ornstein-Uhlenbeck models and I decided to use R and ouch since
 the documentation was great.
 For the first experiments I have been working on some toy trees but
 I'm getting puzzling results: the same tree, simply
 defined with different orders of the string which I use to represent
 it in the Newick format, when analyzed with the Hansen
 functions gives different results.
 I have tried different combinations and in some cases a process
 converges while the other one does not, in other
 cases what change are the chosen optimal sigma (and thus the
 loglikelihoods and so on). This happens
 when I use a regime with different levels for every node in the tree
 but not when I use a single regime.
 
 I'm pasting here an example of this phenomenon. It could be a stupid
 error on my side but I'm not able to spot it right now.
 
 Thanks,
 Elena Grassi
 
 library(ouch)
 library(ape)
 t1 - read.tree(text=(E:10,(D:5,C:5):5);)
 t2 - read.tree(text=((C:5,D:5):5,E:10);)
 out1 - ape2ouch(t1)
 out2 - ape2ouch(t2)
 regime1 - out@nodes
 regime1 - out1@nodes
 regime2 - rep(global, length(outree@nodes))
 regime2 - rep(global, length(out2@nodes))
 expr1 - c(NA, NA, 3, 5, 10)
 expr2 - c(NA, NA, 10, 5, 3)
 models_data1 - data.frame(expr1, regime1, regime2, row.names=out1@nodes)
 models_data2 - data.frame(expr2, regime1, regime2, row.names=out2@nodes)
 m1 - hansen(models_data1[expr1], out1, models_data1[regime1], 1, 1)
 m2 - hansen(models_data2[expr2], out2, models_data2[regime1], 1, 1)
 unsuccessful convergence, code 10, see documentation for `optim'
 Warning message:
 In hansen(models_data2[expr2], out2, models_data2[regime1],  :
  unsuccessful convergence
 m1
 
 call:
 hansen(data = models_data1[expr1], tree = out1, regimes =
 models_data1[regime1],
sqrt.alpha = 1, sigma = 1)
  nodes ancestors times labels regime1 expr1
 1 1  NA   0.0  1NA
 2 2 1   0.5  2NA
 3 3 2   1.0  C   3 3
 4 4 2   1.0  D   4 5
 5 5 1   1.0  E   510
 
 alpha:
 [,1]
 [1,] 3.734194
 
 sigma squared:
 [,1]
 [1,] 9.233622e-30
 
 theta:
 $expr1
 1  2  3  4  5
 0.5049171  1.3917440  3.3191037  5.6847686 10.2324134
 
   loglik  deviance   aic aic.c   sic   dof
 100.5415 -201.0831 -187.0831 -209.4831 -193.39287.
 
 m22 - hansen(models_data2[expr2], out2, models_data2[regime2], 1, 1)
 m11 - hansen(models_data1[expr1], out1, models_data1[regime2], 1, 1)
 m11@loglik
 [1] -7.47607
 m22@loglik
 [1] -7.47607
 
 sessionInfo()
 R version 2.15.1 (2012-06-22)
 Platform: i486-pc-linux-gnu (32-bit)
 
 locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 LC_TIME=en_US.UTF-8
 [4] LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8
 LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 LC_ADDRESS=C
 [10] LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8
 LC_IDENTIFICATION=C
 
 attached base packages:
 [1] stats graphics  grDevices utils datasets  methods   base
 
 other attached packages:
 [1] ape_3.0-5 ouch_2.8-2subplex_1.1-3
 
 loaded via a namespace (and not attached):
 [1] gee_4.13-18 grid_2.15.1 lattice_0.20-10 nlme_3.1-104
 tools_2.15.1
 
 
 -- 
 www.biocut.unito.it
 
 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Marguerite A. Butler
Associate Professor

Department of Biology
University of Hawaii
2450 Campus Rd., Dean 

Re: [R-sig-phylo] Normality requirement for assessment of lambda with phylosig (phytools) and fitContinuous (geiger)

2012-04-25 Thread Marguerite Butler
Hi Nina and everyone,

One thing to consider is that not all zero data are the same. Zeros under a 
model of continuous trait evolution with a gaussian process as assumed under 
Brownian motion and OU processes would occasionally cross zero, maybe go 
negative, etc. For example if you were modeling something like the deviation 
from average height. You may have a lot of individuals that are at zero because 
they are all average, but their offspring will quickly move off zero as some 
will be taller, some shorter than average. 

On the other hand, many of us measure traits which disappear, for example scale 
counts on fish. Numbers of scale rows vary while there are scales, but once 
they disappear, those lineages will be at zero, perhaps for a very long time. 
In this case it is no longer behaving as a gaussian process with small changes 
expected every time period. We usually think of these more as a threshold 
trait. Maybe there is some hormone or something else (a hidden variable) 
underlying the determination of scales or no scales, and once it goes below 
threshhold the scales disappear. The hidden variable may fit model assumptions, 
but not the scale counts (what we can see and measure).  For example, the scale 
counts can never go negative. With a value like height, it also never goes 
negative, but usually we are far away from that zero boundary so we can 
casually ignore that problem:). Or we can do a log-transform, or something else 
that transforms the data onto a different scale.   Anyway, the a!
 bsorbing boundary zeros are, I think, an example of what Ted is talking about. 

So it depends on the nature of your data. On the other hand, for the other 
variable, if there is just not much variation, but it's not stuck on any 
particular value (doesn't appear to have any absorbing boundary), I think 
that's less of a problem.

HTH,
Marguerite 


On Apr 25, 2012, at 7:17 AM, Theodore Garland Jr wrote:

 Read over the Blomberg et al. (2003) paper.
 K is intended for continuous-valued traits and/or those evolving similar to 
 Brownian motion.
 You could report it if you wished, but I would add that caveat if you do.
 
 The randomization test should be robust in any case.
 
 Cheers,
 Ted
 
 
 From: Nina Hobbhahn [n.hobbh...@gmail.com]
 Sent: Wednesday, April 25, 2012 9:19 AM
 To: Theodore Garland Jr
 Cc: Alejandro Gonzalez; Hunt, Gene; Enrico Rezende; r-sig-phylo@r-project.org
 Subject: Re: [R-sig-phylo] Normality requirement for assessment of lambda 
 with  phylosig (phytools) and fitContinuous (geiger)
 
 Thanks all for your helpful contributions! I will use phylosignal.
 
 Ted, I'm not sure I understand your last comment, when the data are not 
 though of as continuous-valued and/or evolving similar to Brownian motion. 
 What do you mean by that? Also, are you suggesting that I report the 
 presence/absence of phylogenetic signal, but not the value of the K statistic?
 
 Many thanks again,
 
 Nina
 
 
 On 2012-04-25, at 5:54 PM, Theodore Garland Jr wrote:
 
 However, calculating a K statistic is strange when the data are not thought 
 of as continuous-valued and/or evolving similar to Brownian motion.  The 
 randomization test is OK, however.
 
 Cheers,
 Ted
 
 From: Alejandro Gonzalez [alejandro.gonza...@ebd.csic.es]
 
 Sent: Wednesday, April 25, 2012 8:46 AM
 
 To: Theodore Garland Jr
 
 Cc: Nina Hobbhahn; r-sig-phylo@r-project.org
 
 Subject: Re: [R-sig-phylo] Normality requirement for assessment of lambda 
 with phylosig (phytools) and fitContinuous (geiger)
 
 
 
 
 
 
 
 Hello,
 
 
 
 
 
 The library picante in R implements Blomberg et al (2003) K estimate, Liam's 
 phytools package does as well. Phytools has the added advantage, if I 
 remember correctly, of allowing users to estimate K including within species 
 variation.
 
 
 
 
 
 Cheers
 
 
 
 
 
 Alejandro
 
 
 
 
 
 
 
 
 
 
 On 25, Apr 2012, at 5:29 PM, Theodore Garland Jr wrote:
 
 
 
 I would suggest the randomization test in Blomberg et al. (2003).  This will 
 give a valid significance test of the null hypothesis of no phylogenetic 
 signal.  By itself, it does not give a measure of the strength (or amount) 
 of phylogenetic signal.  Not
 sure if it is implented in r.  If not, I can send our Matlab code.
 
 
 
 Cheers,
 
 Ted
 
 
 
 Theodore Garland, Jr.
 
 Professor
 
 Department of Biology
 
 University of California, Riverside
 
 Riverside, CA 92521
 
 Office Phone:  (951) 827-3524
 
 Home Phone:  (951) 328-0820
 
 Facsimile:  (951) 827-4286 = Dept. office (not confidential)
 
 Email:  tgarl...@ucr.edu
 
 http://www.biology.ucr.edu/people/faculty/Garland.html
 
 
 
 Experimental Evolution: Concepts, Methods, and Applications of Selection 
 Experiments. 2009.
 
 Edited by Theodore Garland, Jr. and Michael R. Rose
 
 http://www.ucpress.edu/book.php?isbn=9780520261808
 
 (PDFs of chapters are available from me or from the individual authors)
 
 
 
 
 
 From: 

Re: [R-sig-phylo] multiple traits measured within species

2012-03-10 Thread Marguerite Butler
Dear Kaspar,

Are your traits the same? Basically, are the seven color patches on the same 
individual all one trait, or are they seven traits? Do all species have the 
seven color patches? Or are the number of color patches variable? (this could 
also be a character). 

I think before you attempt a comparative analysis, it is helpful to think about 
how you would set up a standard statistical analysis, for example some sort of 
ANOVA.

I am not sure I understand your description clearly. It sort of sounds like you 
are considering the multiple color patches as repeated measures of color patch 
on the same individual, but then you want to include them all as separate 
traits in the analysis. You could consider them as repeated measurements of a 
single color patch trait, in which case you would include the individual and 
species as factors in your analysis. Or you could consider each patch a 
different trait, and then compare them across species, so that each color patch 
is a separate dependent variable (in which case color patch 1 would be the 
same trait across individuals and species, etc.). Or you could take the mean of 
all color patches within an individual and use that in an analysis across 
individuals and species. 

Do you have any clues from development as to their identity? It seems like you 
want to consider them all to be replicates of the same measure.  But then why 
measure so many? I assume there is variation among color patches within an 
individual? Ultimately you will have to decide if 3 color patches is one trait 
or if it is three traits. After you decide this, it will be much easier to 
design a comparative analysis. 

Good luck,
Marguerite

On Mar 6, 2012, at 2:42 PM, Kaspar Delhey wrote:

 Hello,
 
 I was wondering whether I could get some input into the following analysis:
 
 I am interested to test for the association between one dependent variable 
 (continuous, normal) and one factor and a covariate while accounting for 
 phylogenetic relatedness. In my analysis I have 12 species and I have 
 measured the dependent variable and the covariate (which are both colour 
 descriptors) on several patches per species (between 1 and 7 patches per 
 species). The factor (the type of visual sensitivity) does not vary within 
 species. Thus the repeated observations per species do not correspond to 
 different individuals but are different patches of colour in the same 
 individual, measured on the same scale and units (hence directly comparable).
 
 Does it make sense to make use of the whole dataset (i.e. without computing 
 species averages) by replacing each tip in the phylogeny by a polytomy 
 including all patches measured in that particular species?
 
 For example if I have this tree:
 
 (A, (B, C))
 
 and I have measured two patches of colour in sp. A and three in spp B and C  
 I could replace the tips in this tree by:
 
 ((A1,A2),((B1,B2,B3),(C1,C2,C3)))
 
 
 In this way I could use a gls approach with nlme and for example corPagel 
 such as:
 
 model1-gls(dep.var~factor+covariate, correlation=Pagel, method=REML, 
 data=data)
 
 The results that I obtain from such a model make sense and qualitatively 
 agree with the results from a mixed model including species ID as a random 
 factor but ignoring phylogenetic relatedness between species.
 
 My question then is whether there is a flaw in this analysis and whether it 
 has been used before in other publications in order to be able to include 
 references to back it up.
 
 Thanks in advance for any help.
 
 best
 
 kaspar
 
 
 
 
 -- 
 Kaspar Delhey
 e-mail: kaspar.del...@monash.edu
 https://sites.google.com/site/kaspardelhey/
 Tel:+61-(0)3-99020377
 Bldg.18 School of Biological Sciences
 Monash University
 Clayton, 3800 Victoria
 Australia
 
 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] How to detect phylogenetic signal (lambda) in one unscaled trait?

2011-04-01 Thread Marguerite Butler
Hi Folks,

After that very long-winded answer I still managed to leave out something. I 
want to reinforce what Ted and Joe said regarding absolute vs. log scales.  
Most comparative methods machinery as currently implemented work with absolute 
scales (the random component of the evolutionary change is modeled as normal 
and identically distributed through time, and adds up to some final amount of 
change and is therefore additive). So if you don't log-transform your data, 
then you are allowing your mice to evolve the same absolute value in size as 
you are your elephants. 

Marguerite


On Apr 1, 2011, at 7:57 AM, Marguerite Butler wrote:

 Hi Alberto and Ted, and others, 
 
 Maybe this is too late to jump into the discussion, but I wanted to add a few 
 comments regarding analysis of the evolution of size and shape.  I agree with 
 Ted that there are a number of important considerations:  (1) size and its 
 meaning, (2) how you remove it, and (3) the model of size and size evolution. 
  And it is really important to think about the biology as well as the 
 statistical considerations. Too often the biology gets lost. 
 
 First is what is the meaning of size, and why are you trying to remove it? 
 What is the underlying biological relationship? For example, if you are 
 interested in relative growth, then probably size is best captured by mass. 
 Alternatively, if you are interested in limbed locomotion, then having long 
 legs relative to your body length is more important, and so your size is 
 equal to body length.  Or if you are interested in an anti-predator behavior, 
 then being bigger is important because you look scarier. In that case, size 
 might mean mass if you animal gets larger more or less similarly in all 
 dimensions, or it might be relative to a characteristic length (say jaw width 
 or a head size dimension) if your animal's threat display is in a particular 
 orientation. 
 
 What you choose as your size variable is critical for interpreting what you 
 have left over after adjusting for it (usually some aspect of shape). Also, 
 you should always check to make sure that your shape variables make sense. 
 After you removed size, just look at your shape data. Do your shape 
 variables look ridiculous? If so, then you should reconsider before you do 
 your analysis.  Make sure that your shape data actually captures (hopefully 
 accentuates) the biological relationship you're trying to look at, and not 
 obscure it. Also remember that size is actually a hugely important 
 variable(!), important for nearly all aspects of an animal's ecology and 
 physiology.
 
 Removing size -- There is a huge literature on size and shape analysis 
 (allometry), especially in the quantitative genetics and morphometrics 
 literature, which seems to get bypassed in the comparative phylogenetics 
 literature, which is a shame. The two papers that I have found most helpful 
 are:
 
 Klingenberg, C. P. 1996. Multivariate allometry. Pages 23– 49 in L. F. 
 Marcus, M. Corti, A. Loy, G. Naylor, and D. E. Slice, editors. Advances in 
 morphometrics. Plenum Press, New York, New York, USA.
 Mosimann, J. 1970. Size allometry: size and shape variables with 
 characterizations of the lognormal and generalized gamma distributions. 
 Journal of the American Statistical Association 65:930–945.
 
 Most phylogenetic comparative studies use species-regressions, as Ted 
 mentioned, to remove size. This is a statistically valid approach, but there 
 are other considerations as well. One undesirable consequence is that this 
 method is extremely sensitive to your sample of species. If you remove some 
 of your species or add new ones, then your regression line will change, which 
 means that your size variable (and consequently all of your shapes) are now 
 different.  Your shape variables will definitely not be comparable between 
 studies. So the question is -- is it OK that your idea of big and small can 
 change depending on which datapoints are included?
 
 An alternative method is to use a size index which is an inherent property of 
 the individual, and use the size index to adjust each individual's 
 measurements to create the shape variables. This is explained in the Mosimann 
 1970 paper. I typically use a geometric-mean size index, and take the 
 log-ratio of each variable to size:  shape1 = log( var1 / sizevar) = 
 log(var1) - log(sizevar).   People most often take log-transformed data when 
 working with morphometric data because most often these data follow power law 
 relationships with size (this breaks up multiplicative relationships into 
 additive (linear) ones). Whether or not the original variables scale in a 
 linear isometric relationship with size is not really an issue, the logs take 
 care of linearizing power relationships, and you can always look at the 
 relationships that result to interpret the scaling factors. 
 
 If you have data on individuals, then the advantage of this method also

Re: [R-sig-phylo] multi-state categorical predictor variables in PGLS

2011-03-07 Thread Marguerite Butler
Hi Andrew, 
 
  Does this sidestep the degrees of
 freedom problem discussed by Garland et al.?  Can anybody point me to
 references discussing the mechanics of this process and why this is an
 appropriate thing to do?

 

Others on this list will disagree with me, but it's not a degrees of freedom 
problem. What phylogeny introduces is not a reduction in the number of degrees 
of freedom, but rather covariance in the traits at the end of the evolutionary 
process. The number of species is the number of species. Period. But they are 
sometimes similar to different degrees because of shared history. Statisticians 
call this problem correlated errors, in this case, the phylogeny is the 
error:) that is obscuring the true relationship between ecology and your 
trait of interest, for example. It is simple to remove the correlated error 
by accounting for this shared covariance structure, which is what the PGLS code 
does. It is basically like normalizing variances in univariate statistics by 
dividing by the variance. In this case the variance is a multivariate 
covariance matrix, calculated from the phylogeny (expected similarity based on 
shared ancestry). 

The dummy code is just a way to do include a categorical (multistate) variable 
in the ANOVA, and is a standard parametric statistic procedure. We describe how 
to do effect coding by hand in an old paper (see appendix). Doing it by hand 
helps to understand what is going on:

Butler M.A. Schoener T.W., and Losos J.B. (2000)  The relationship between 
habitat type and sexual size dimorphism in Greater Antillean Anolis lizards.  
Evolution 54(1):259-272.

Anyway, yes, this should take care of the phylogenetic analysis. (even though I 
wouldn't call it a degrees of freedom problem).

Marguerite

On Mar 7, 2011, at 7:06 AM, Alejandro Gonzalez V wrote:

 Hi Andrew,
 
 As I understand it, gls() is doing a multiple generalized LS
 regression with as many dummy variables as there are factor levels.
 Is this a correct characterization?
 
 I think you'd get one dummy variable less than factor levels in your 
 characterization (at least in regards to the number of levels for which 
 parameters are estimated), as the gls sets one of the levels as the point 
 of comparison with all other levels. Thus you'd get n-1 dummy variables for 
 which the parameters are estimated.
 
 Having such a low value of alpha the results of the phylogenetic gls should 
 be similar (if not identical) to results not taking phylogeny into account, 
 as this suggests you don't have phylogenetic signal in the residuals of your 
 relationship.
 There is a good paper on this issue by Liam Revell in Methods in Ecology and 
 Evolution.
 There is also a function in geiger which allows you to run phylogenetic 
 ANOVAs, but if I am not mistaken, the p-value is estimated based on 
 simulations assuming traits evolve via Brownian motion (is this correct?).
 I've also seen lambda values below 0 in ape, theoretically lambda is 
 described as being bounded between 0 and 1, but it could take values outside 
 the bounds. I would be interested in hearing the thoughts of others in the 
 list regarding whether lambda values for the Phylogenetic gls should be 
 forced to be bounded between 0 and 1. This would more closely follow what has 
 been proposed in the literature wouldn't it?
 
 Cheers
 
 Alejandro
 
 __
 
 Alejandro Gonzalez Voyer
 Post-doc
 
 Estación Biológica de Doñana (CSIC)
 Avenida Américo Vespucio s/n
 41092 Sevilla 
 Spain
 
 Tel: +34- 954 466700, ext 1749
 
 E-mail: alejandro.gonza...@ebd.csic.es
 
 Web-site: https://docs.google.com/View?id=dfs328dh_14gwwqsxcg
 
 
 
 
 
 
 
 
 
 
 
 
 
 On 7, Mar 2011, at 5:52 PM, Andrew Barr wrote:
 
 Hi everyone,
 
 I am trying to piece together the current best-practices for
 phylogenetic ANOVA with multi-state predictors.
 
 In my dataset, my four-level factor is non-random with respect to
 phylogeny.  That is, if I know which higher level clade an species
 belongs to, I can predict with pretty good success which factor level
 it will be in.  My understanding is that this situation likely
 overinflates my degrees of freedom and makes traditional F-tests
 inappropriate. I came across this paper (Garland et al 1993.
 Phylogenetic Analysis of Covariance by Computer Simulation. Systematic
 Biology 42:265 -292.) where the authors empirically recalculate
 critical values for F-ratios using computer simulations, tree
 topology, and a model of character evolution.
 
 I also have found that I can use PGLS (with ape and nlme) and specify
 my model like this.
 
 gls(myVar~myFactor,corr=corPagel(val=1,phy=myTree,fixed=F),data=myDF)
 
 As I understand it, gls() is doing a multiple generalized LS
 regression with as many dummy variables as there are factor levels.
 Is this a correct characterization?  Does this sidestep the degrees of
 freedom problem discussed by Garland et al.?  Can anybody point me to
 references discussing the 

Re: [R-sig-phylo] Dealing with Bounded Trait Measures

2011-03-06 Thread Marguerite Butler
Hi David,

I am not sure what you mean. If you add a tiny value to all of your data, then 
all you've done is shifted the mean ever so slightly. You then log-transform to 
put the variables on a scale unbounded by 0, and then you measure phylogenetic 
signal. The BM process models evolutionary changes as random draws from a 
normal distribution. So this would be the log-transformed values here.  Adding 
a tiny value to all data doesn't change things very much. Log transformation 
changes things a lot more. So you're modeling a BM process on the 
log-transformed values. But there is nothing illegal about that. You do have 
to think about it, however. 

This makes a lot of sense to me when I'm modeling body size evolution, because 
the model would imply that when you're small, evolutionary changes tend to be 
smaller than when you're big (evolutionary changes on larger animals would tend 
to have bigger jumps on the original scale). Ignoring any clade-specific 
biases, evolutionary changes in the body size of mice would tend to be a lot 
smaller than evolutionary changes in the size of horses, for example. We can 
imagine that some simple mechanistic reasons such as their subunits are smaller 
(they have smaller cells, smaller vertebrae, etc., so adding on or taking away 
would have bigger size effects). This seems biologically reasonable. I am not 
sure if it would apply to your spine size data, but maybe it does. Of course if 
your size variation is not that great, then it doesn't much matter (because 
log-trasformation will then not have much effect on compressing the scale of 
the values).

Anyway, this may be a longer-winded comment than you were asking for. I can't 
think of any reason why adding tiny values would do anything to the 
evolutionary simulation.

Marguerite


On Mar 6, 2011, at 9:53 AM, David Bapst wrote:

 Marguerite and others-
 I had missed that Scales et al paper. Thank for pointing it out. I
 have also had to deal with variables that were bounded by 0 and 1
 previously, and the use of the logit is something I'll have to try
 when I go back to those analyses.
 
 Although I am interested in transformations of such traits for
 comparative analyses in general (measuring correlated trait evolution,
 model fitting of OU, measurements of rate, etc.), I am mainly
 interested in measurements of phylogenetic signal. Is there any reason
 to think the kludge (adding an arbitrarily small value to trait
 values) would bias a test for phylogenetic signal (i.e. do the trait
 show a pattern of inheritance)? I'm worried that measurements of
 signal might be prone to odd results with the kludge, because the data
 structure would still be quite different from BM expectation. Perhaps,
 the threshold model might be of a lot of use here; I'm going to have
 to look into that much more.
 
 Thanks for you advice, all!
 -Dave
 
 
 On Sun, Mar 6, 2011 at 12:50 AM, Marguerite Butler mbut...@hawaii.edu wrote:
 Hi David, Liam and everyone,
 
 Reflecting traits at boundaries or absorbing them is something that can be 
 done, but I guess I'd like to encourage everyone to think carefully about 
 the interpretation of such simulations. What are you trying to model and 
 what does it mean at the end? Doing these boundary manipulations produces 
 odd shaped-distributions of traits at the end of the simulated process. But 
 probably more importantly, does this a good model for the biological process 
 which might be occurring? If so, then great. But I would guess not with 
 traits such as spine length.
 
 The zero issue is a hard problem. Is it a combination of a discrete trait 
 (presence/absence) and a continuous trait (length)? Or perhaps a better 
 model would be a threshold trait with underlying continuous variation but 
 below some boundary it will not be expressed? Has anyone developed a model 
 that can be used for such a scenario? In the absence of such a model, I've 
 done things like what Enrico suggests, with a kludge solution like adding a 
 very small value to all values to avoid the zero. This doesn't make much 
 difference to the analysis, and avoids the singularity at log(0). If you 
 have a trait that is a frequency, you can use the logit() function instead 
 of log(), as we did in Scales et al 2009.
 
 Scales J.A., King A.A., and Butler M.A. (2009) Running for your life or 
 running for your dinner: What drives fiber-type evolution in lizard 
 locomotor muscles? Am. Nat.173:543-553.
 
 Marguerite
 
 On Mar 5, 2011, at 11:06 AM, Liam J. Revell wrote:
 
 I'm not sure that this is what Dave has in mind, but if anyone is 
 interested in simulating bounded evolution in R, I just added it to my 
 fastBM() function (code here: 
 http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/fastBM/v0.3/fastBM.R).
 
 In the process of evolving traits up the tree, I just bounce back any 
 phenotypes that exceed the lower or upper boundary conditions specified by 
 the user (by default they are -Inf and Inf).  I think I

Re: [R-sig-phylo] Dealing with Bounded Trait Measures

2011-03-05 Thread Marguerite Butler
Hi David, Liam and everyone,

Reflecting traits at boundaries or absorbing them is something that can be 
done, but I guess I'd like to encourage everyone to think carefully about the 
interpretation of such simulations. What are you trying to model and what does 
it mean at the end? Doing these boundary manipulations produces odd 
shaped-distributions of traits at the end of the simulated process. But 
probably more importantly, does this a good model for the biological process 
which might be occurring? If so, then great. But I would guess not with traits 
such as spine length.  

The zero issue is a hard problem. Is it a combination of a discrete trait 
(presence/absence) and a continuous trait (length)? Or perhaps a better model 
would be a threshold trait with underlying continuous variation but below some 
boundary it will not be expressed? Has anyone developed a model that can be 
used for such a scenario? In the absence of such a model, I've done things like 
what Enrico suggests, with a kludge solution like adding a very small value to 
all values to avoid the zero. This doesn't make much difference to the 
analysis, and avoids the singularity at log(0). If you have a trait that is a 
frequency, you can use the logit() function instead of log(), as we did in 
Scales et al 2009.

Scales J.A., King A.A., and Butler M.A. (2009) Running for your life or running 
for your dinner: What drives fiber-type evolution in lizard locomotor muscles? 
Am. Nat.173:543-553.

Marguerite

On Mar 5, 2011, at 11:06 AM, Liam J. Revell wrote:

 I'm not sure that this is what Dave has in mind, but if anyone is interested 
 in simulating bounded evolution in R, I just added it to my fastBM() 
 function (code here: 
 http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/fastBM/v0.3/fastBM.R).
 
 In the process of evolving traits up the tree, I just bounce back any 
 phenotypes that exceed the lower or upper boundary conditions specified by 
 the user (by default they are -Inf and Inf).  I think I did this properly.  
 Feedback welcome though.
 
 - Liam
 
 -- 
 Liam J. Revell
 University of Massachusetts Boston
 web: http://faculty.umb.edu/liam.revell/
 (new) email: liam.rev...@umb.edu
 (new) blog: http://phytools.blogspot.com
 
 On 3/5/2011 12:55 PM, tgarl...@ucr.edu wrote:
 Hello David, Enrico, et al.,
 
 I may have lost track of what Dave was originally trying to do, and I am not 
 familiar with all of the options presently available in r for simulating 
 continuously valued traits along a specified phylogenetic tree.  However, I 
 wanted to point out that MANY possibilities, including trends, the OU 
 process, and actual limits to trait evolution implemented in several ways, 
 are available in our original DOS program PDSIMUL.EXE that accompanies this 
 paper:
 
 Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. 
 Phylogenetic analysis of covariance by computer simulation. Systematic 
 Biology 42:265-292.
 
 It has been used many times to look at trends, limits, etc., e.g., in these 
 papers:
 
 Díaz-Uriarte, R., and T. Garland, Jr. 1996. Testing hypotheses of correlated 
 evolution using phylogenetically independent contrasts: sensitivity to 
 deviations from Brownian motion. Systematic Biology 45:27-47.
 Laurin, M. 2010. Assessment of the relative merits of a few methods to 
 detect evolutionary trends. Syst. Biol. 59:689-704.
 
 Cheers,
 Ted
 
 
 
 Theodore Garland, Jr.
 Professor
 Department of Biology
 University of California, Riverside
 Riverside, CA 92521
 Office Phone:  (951) 827-3524
 Lab Phone:  (951) 827-5724
 Home Phone:  (951) 328-0820
 Facsimile:  (951) 827-4286 = Dept. office (not confidential)
 Email:  tgarl...@ucr.edu
 
 Main Departmental page:
 http://www.biology.ucr.edu/people/faculty/Garland.html
 
 List of all Publications:
 http://www.biology.ucr.edu/people/faculty/Garland/GarlandPublications.html
 
 Garland and Rose, 2009
 http://www.ucpress.edu/books/pages/10604.php
 
 
    Original message 
 
 Date: Sat, 05 Mar 2011 15:36:13 +0100
 From: Enrico Rezendeenrico.reze...@uab.cat
 Subject: Re: [R-sig-phylo] Dealing with Bounded Trait Measures
 To: David Bapstdwba...@uchicago.edu
 Cc: R Sig Phylo Listservr-sig-phylo@r-project.org
 
 David,
 on the top of my head, if no species measurement strictly
 corresponds to
 zero, you may log-transform the data. You may then simulate
 Brownian
 motion in log-transformed values, which will correspond to a
 boundary of
 zero in a linear scale (i.e., the more negative the log number, the
 closer the trait value is to zero - but never zero - in a linear
 scale).
 This also explains why you can simulate the evolution of body mass
 employing Brownian motion in log-transformed units and no species
 will
 ever be assigned a body mass of zero. On more speculative grounds,
 this
 may simply reflect the fact that many biological processes and
 their
 

Re: [R-sig-phylo] Dealing with Bounded Trait Measures

2011-03-05 Thread Marguerite Butler
p.s. an absorbing boundary might be a good model for something like the 
disappearance of a gene or something. When it goes to such a low frequency it 
goes extinct. Then in order to reevolve, it needs to mutate (basically 
re-originate, which is a very rare event). But for reversible traits, 
absorbing boundaries don't seem like a good model to me. I have never been able 
to think up a good biological mechanism for a reflecting boundary. 

Sorry for the extra email from the incomplete thought.

M

On Mar 5, 2011, at 8:50 PM, Marguerite Butler wrote:

 Hi David, Liam and everyone,
 
 Reflecting traits at boundaries or absorbing them is something that can be 
 done, but I guess I'd like to encourage everyone to think carefully about the 
 interpretation of such simulations. What are you trying to model and what 
 does it mean at the end? Doing these boundary manipulations produces odd 
 shaped-distributions of traits at the end of the simulated process. But 
 probably more importantly, does this a good model for the biological process 
 which might be occurring? If so, then great. But I would guess not with 
 traits such as spine length.  
 
 The zero issue is a hard problem. Is it a combination of a discrete trait 
 (presence/absence) and a continuous trait (length)? Or perhaps a better model 
 would be a threshold trait with underlying continuous variation but below 
 some boundary it will not be expressed? Has anyone developed a model that can 
 be used for such a scenario? In the absence of such a model, I've done things 
 like what Enrico suggests, with a kludge solution like adding a very small 
 value to all values to avoid the zero. This doesn't make much difference to 
 the analysis, and avoids the singularity at log(0). If you have a trait that 
 is a frequency, you can use the logit() function instead of log(), as we did 
 in Scales et al 2009.
 
 Scales J.A., King A.A., and Butler M.A. (2009) Running for your life or 
 running for your dinner: What drives fiber-type evolution in lizard locomotor 
 muscles? Am. Nat.173:543-553.
 
 Marguerite
 
 On Mar 5, 2011, at 11:06 AM, Liam J. Revell wrote:
 
 I'm not sure that this is what Dave has in mind, but if anyone is interested 
 in simulating bounded evolution in R, I just added it to my fastBM() 
 function (code here: 
 http://anolis.oeb.harvard.edu/~liam/R-phylogenetics/fastBM/v0.3/fastBM.R).
 
 In the process of evolving traits up the tree, I just bounce back any 
 phenotypes that exceed the lower or upper boundary conditions specified by 
 the user (by default they are -Inf and Inf).  I think I did this properly.  
 Feedback welcome though.
 
 - Liam
 
 -- 
 Liam J. Revell
 University of Massachusetts Boston
 web: http://faculty.umb.edu/liam.revell/
 (new) email: liam.rev...@umb.edu
 (new) blog: http://phytools.blogspot.com
 
 On 3/5/2011 12:55 PM, tgarl...@ucr.edu wrote:
 Hello David, Enrico, et al.,
 
 I may have lost track of what Dave was originally trying to do, and I am 
 not familiar with all of the options presently available in r for 
 simulating continuously valued traits along a specified phylogenetic tree.  
 However, I wanted to point out that MANY possibilities, including trends, 
 the OU process, and actual limits to trait evolution implemented in several 
 ways, are available in our original DOS program PDSIMUL.EXE that 
 accompanies this paper:
 
 Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. 
 Phylogenetic analysis of covariance by computer simulation. Systematic 
 Biology 42:265-292.
 
 It has been used many times to look at trends, limits, etc., e.g., in these 
 papers:
 
 Díaz-Uriarte, R., and T. Garland, Jr. 1996. Testing hypotheses of 
 correlated evolution using phylogenetically independent contrasts: 
 sensitivity to deviations from Brownian motion. Systematic Biology 45:27-47.
 Laurin, M. 2010. Assessment of the relative merits of a few methods to 
 detect evolutionary trends. Syst. Biol. 59:689-704.
 
 Cheers,
 Ted
 
 
 
 Theodore Garland, Jr.
 Professor
 Department of Biology
 University of California, Riverside
 Riverside, CA 92521
 Office Phone:  (951) 827-3524
 Lab Phone:  (951) 827-5724
 Home Phone:  (951) 328-0820
 Facsimile:  (951) 827-4286 = Dept. office (not confidential)
 Email:  tgarl...@ucr.edu
 
 Main Departmental page:
 http://www.biology.ucr.edu/people/faculty/Garland.html
 
 List of all Publications:
 http://www.biology.ucr.edu/people/faculty/Garland/GarlandPublications.html
 
 Garland and Rose, 2009
 http://www.ucpress.edu/books/pages/10604.php
 
 
   Original message 
 
Date: Sat, 05 Mar 2011 15:36:13 +0100
From: Enrico Rezendeenrico.reze...@uab.cat
Subject: Re: [R-sig-phylo] Dealing with Bounded Trait Measures
To: David Bapstdwba...@uchicago.edu
Cc: R Sig Phylo Listservr-sig-phylo@r-project.org
 
 David,
 on the top of my head, if no species measurement strictly
corresponds to
 zero, you may log-transform the data. You may then simulate

Re: [R-sig-phylo] Identifying nodes in ouch tree

2011-01-24 Thread Marguerite Butler
Hi Alejandro,

The info you want is given in the example script of the dataset bimac, which is 
included for ouch. To see the help page:

require(ouch)
data(bimac)
?bimac

Then what you need is:

tree - with(bimac,ouchtree(node,ancestor,time/max(time),species))
plot(tree,node.names=TRUE)

The option you need is the node.names=TRUE option. 

Also, it is (sparsely) documented in the ouchtree help page, under the plot 
method:

?ouchtree

Marguerite

On Jan 24, 2011, at 7:30 AM, Alejandro Gonzalez wrote:

 Hello,
 
 I want to paint regimes on an ouch tree based on a specific dichotomous 
 trait. I've identified the nodes and branches on the phylogeny where the 
 selection regimes would change based on an ancestral state reconstruction of 
 the character of interest. I wanted to know if there is a way of showing the 
 node numbers of a plotted image of the ouchtree object as can be done in ape 
 for a phylo object using nodelabels(). I ask this because when I transform my 
 phylo object to an ouch object the node numbers no longer correspond.
 How can one identify which node numbers correspond to specific nodes on the 
 phylogeny? Or do I simply have to make it out from the information available 
 using print(ouchtree object)?
 
 Thanks,
 
 Alejandro
 
 __
 
 Alejandro Gonzalez Voyer
 
 Post-doc
 
 Estación Biológica de Doñana
 Consejo Superior de Investigaciones Científicas (CSIC)
 Av Américo Vespucio s/n
 41092 Sevilla
 Spain
 
 Tel: + 34 - 954 466700, ext 1749
 
 E-mail: alejandro.gonza...@ebd.csic.es
 
 Web page: https://docs.google.com/View?id=dfs328dh_14gwwqsxcg
 
 
 
 
   [[alternative HTML version deleted]]
 
 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Marguerite A. Butler
Associate Professor
Department of Zoology
University of Hawaii
2538 McCarthy Mall, Edmondson 259
Honolulu, HI  96822

FAX:   808-956-9812
Dept: 808-956-8617
http://www2.hawaii.edu/~mbutler
http://www.hawaii.edu/zoology/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] ancestral trait reconstruction and OU model

2009-10-26 Thread Marguerite Butler
Dear Fabio,

Andrew is correct in his answer regarding the new behavior of OUCH  
(thanks Andrew). In the new version of OUCH, the hansen fit has been  
modified so that the ancestral state at the root is no longer  
estimated, but it is rather assumed to be in one of the regimes in the  
dataset. In most of our work, the ancestral state at the root was  
really a nuisance parameter, and in some cases was returning really  
odd parameter estimates. This was probably because there is very  
little data to inform the root state, and so fitting a somewhat  
extraneous parameter seemed to cause a non-identifiability problem.

This is probably because there is more than one way to fit (or to  
explain the data). For example, one could get a good fit by having the  
ancestral state close to optimum 1 with weak selection, or one could  
have the ancestral state very far away from optimum 1 with strong  
selection. Both solutions could result in reasonable fit. If the  
ancestor is free to vary, it is possible to obtain similar fits for  
different combinations of parameters.

Since most comparative biologists don't have fossil data or any  
internal data to peg the root down, and since it is reasonable in many  
situations to assume that the ancestor was in one of the current  
adaptive regimes (usually you're only analyzing the ingroup), we  
thought it made the best sense to just include the root as being in  
one of the selective regimes. If one is unsure which adaptive regime  
the ancestor should be in, you could test all possible models with the  
ancestor in each of the selective regimes, to see which provides the  
best fit.

However, if one has fossil data, then there is much better information  
to anchor the root. Also, you may have good reasons for estimating  
the root. In this case, do as Andrew says and assign a unique adaptive  
regime to the root.  I believe this will provide the same fit as in  
the original version of OUCH.

In general, we haven't found that estimating the ancestor separately  
affects model selection at all (i.e., you obtain the same best model  
no matter whether you estimate the ancestor or not). It seems to  
affect alpha and sigma only a little, but it can move the other optima  
around a bit (if you push the ancestor back, it tends to exaggerate  
the other states as well so that you end up at the proper target --  
the data which you are trying to fit).

Hope this is not too confusing.

Take care,
Marguerite Butler


On Oct 26, 2009, at 8:53 AM, Andrew Hipp wrote:

 Dear Fabio,

 In ouch v2 and up, the root state is not treated as a free parameter
 and is not returned. One way to think about your question might be to
 assume that the branches at the base of the tree have their own  
 optimum,
 and then treat that optimum as the root state. But in doing so, you
 should consider whether there are likely to have been transitions in
 selective regime among those branches that might make such an  
 assumption
 unreasonable, and whether the estimate of alpha under the models you
 look at suggest that the optimum in those basal branches was reached
 quickly. To look at the former, you might look at whether alpha and
 sigma^2 estimates increase markedly when you model the basal branches
 separately as a way of estimating the ancestral state; if so, I would
 have doubts about the validity of modeling the root state of your
 characters that way. Also, some topologies aren't amenable to this
 treatment (unless you model just basal portions of the relevant  
 branches
 as being in the root optimum, an option which is not available in ouch
 but valid nonetheless). To look at the latter, see Hansen (1997) re:  
 the
 relationship between alpha and the rate at which a character evolves
 toward its optimum.

 If you want to look at the ancestral state using the old versions of
 ouch, you could try one of the 1.x versions of ouch from the ouch
 archive on cran (http://cran.r-project.org/src/contrib/Archive/ouch/).
 Be aware, though, that ouch 2.x is written in S4 and differs
 substantially from ouch 1.x, but if you have your trees and regimes  
 set
 up already for ouch 2.x, you should be fine with the old versions.

 Andrew Hipp

 Andrew L. Hipp, PhD
 Plant Systematist and Herbarium Curator
 The Morton Arboretum
 4100 Illinois Route 53
 Lisle, IL  60532-1293
 ah...@mortonarb.org
 (630) 725-2094
 http://redwood.mortonarb.org/lab_pages/hipp

 Fabio de Andrade Machado f.mach...@usp.br 10/26/2009 3:12 AM 
 Hi Graham,

 thanks for the reply.

 I don't know if this a particular problem for my kind of data (various

 characters), but I'm not receiving any Theta zero on the output.
 Just thetas for each selective regime. Is this a version problem? I'm

 with ouch v. 2.5-7

 I do realize that in complex regimes the contribution of the terminal

 taxa to the root state is negligible, but as I'm dealing with
 morphometric data I have the possibility to cross-check the result
 with fossil data