[R-sig-phylo] Distance matrix from phylogeny

2013-07-12 Thread Eugen
Hi folks,

I have phylogeny with absolute branch lengths in million years. How do I
get a distance matrix with values between 0 and 1?

Best,

Eugen

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Distance matrix from phylogeny

2013-07-12 Thread Florian Boucher
Hi Eugen,

to compute a distance matrix from a phylogeny, you can use the
cophenetic.phylo function in {ape}.
You then have to rescale all distances in you matrix (which are expressed
in million years) by the total depth of the tree:

dist=cophenetic.phylo(tree)
dist2=dist/max(dist)

Best,

Florian



2013/7/12 Eugen eugen...@web.de

 Hi folks,

 I have phylogeny with absolute branch lengths in million years. How do I
 get a distance matrix with values between 0 and 1?

 Best,

 Eugen

 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at
 http://www.mail-archive.com/r-sig-phylo@r-project.org/


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] PGLS vs lm

2013-07-12 Thread Tom Schoenemann
OK, I started going through the Ives et al. paper - thanks for that.  Note that 
my data is not brain size vs. body size, but brain size vs. social group size 
(not a measure for which brain size is a subset).

For our particular dataset, I believe we were not able to find much in the way 
of within-species variation for one of the variables - typically one report per 
species, and usually no variation given (but I'm not sure on that - I'll have 
to check). 

Regarding what exactly we want to do:

1) is there a significant association between brain size and two other 
behavioral dimensions (reported in the literature), after taking into account 
(as best we can) phylogeny.  This is why I was trying PGLS. We probably also 
want to look at the relationship within clades (is there a phylogenetically 
appropriate version of ANCOVA?).

2) are these two other behavioral measures independently associated with brain 
size (after controlling for the other) - I'm assuming this would be a 
phylogenetically appropriate version of multiple regression

But my issue is that, if I use PGLS, I get significant coefficients if I do it 
one direction, and not in the other. This makes me skeptical that there is a 
significant association in the first place.

-Tom


On Jul 11, 2013, at 4:32 PM, Theodore Garland Jr theodore.garl...@ucr.edu 
wrote:

 I think the issue is largely one of conceptualizing the problem.
 People often view body size as an independent variable when analyzing brain 
 size, but obviously this is a serious oversimplificaiton -- usually done for 
 statistical convenience -- that does not reflect the biology (yes, I have 
 also done this!).  Moreover, brain mass is part of body mass, so if you use 
 body mass per se as an independent variable then you have potential 
 part-whole correlation statistical issues.
 
 I would think carefully about what you are really wanting to do (e.g., 
 regression vs. correlation vs. ANCOVA), and check over this paper:
 
 Ives, A. R., P. E. Midford, and T. Garland, Jr. 2007. Within-species 
 variation and measurement error in phylogenetic comparative methods. 
 Systematic Biology 56:252-270.
 
 
 And maybe this one:
 
 Garland, T., Jr., A. W. Dickerman, C. M. Janis, and J. A. Jones. 1993. 
 Phylogenetic analysis of covariance by computer simulation. Systematic 
 Biology 42:265-292.
 
 
 Cheers,
 Ted
 
 Theodore Garland, Jr., Professor
 Department of Biology
 University of California, Riverside
 Riverside, CA 92521
 Office Phone:  (951) 827-3524
 Wet Lab Phone:  (951) 827-5724
 Dry Lab Phone:  (951) 827-4026
 Home Phone:  (951) 328-0820
 Skype:  theodoregarland
 Facsimile:  (951) 827-4286 = Dept. office (not confidential)
 Email:  tgarl...@ucr.edu
 http://www.biology.ucr.edu/people/faculty/Garland.html
 http://scholar.google.com/citations?hl=enuser=iSSbrhwJ
 
 Inquiry-based Middle School Lesson Plan:
 Born to Run: Artificial Selection Lab
 http://www.indiana.edu/~ensiweb/lessons/BornToRun.html
 
 From: r-sig-phylo-boun...@r-project.org [r-sig-phylo-boun...@r-project.org] 
 on behalf of Tom Schoenemann [t...@indiana.edu]
 Sent: Thursday, July 11, 2013 11:19 AM
 To: Emmanuel Paradis
 Cc: r-sig-phylo@r-project.org
 Subject: Re: [R-sig-phylo] PGLS vs lm
 
 Thanks Emmanuel,
 
 OK, so this makes sense in terms of the math involved. However, from a 
 practical, interpretive perspective, shouldn't I assume this to mean that we 
 actually cannot say (from this data) whether VarA and VarB ARE actually 
 associated with each other? In the real world, if VarA is causally related to 
 VarB, then by definition they will be associated. Doesn't this type of 
 situation - where the associations are judged to be statistically significant 
 in one direction but not in the other - suggest that we actually DON'T have 
 confidence that - independent of phylogeny - VarA is associated with VarB?  
 Putting this in the context of the actual variables involved, doesn't this 
 mean that we actually can't be sure brain size is associated with social 
 group size (in this dataset) independent of phylogeny?
 
 I notice that the maximum likelihood lambda estimates are different (though 
 I'm not sure they are significantly so). I understand this could 
 mathematically be so, but I'm concerned with how to interpret this. In the 
 real world, how could phylogenetic relatedness affect group size predicting 
 brain size, more than brain size predicting group size? Isn't this a logical 
 problem (for interpretation - not for the math)? In other words, in 
 evolutionary history, shouldn't phylogeny affect the relationship between two 
 variables in only one way, which would show up whichever way we approached 
 the association? Again, I understand the math may allow it, I just don't 
 understand how it could actually be true over evolutionary time.
 
 Thanks in advance for helping me understand this better,
 
 -Tom
 
 
 On Jul 11, 2013, at 5:12 AM, Emmanuel Paradis emmanuel.para...@ird.fr wrote:
 
  Hi Tom,
  

Re: [R-sig-phylo] PGLS vs lm

2013-07-12 Thread Tom Schoenemann
Thanks Liam,

OK, I'm starting to understand this better. But I'm not sure what now to do. 
Given that the mathematics are such that a PGLS gives significance in one 
direction, but not in another, what is the most convincing way to show that the 
two variables really ARE associated (at some level of probability) independent 
of phylogeny?

Ultimately I want to investigate the following: Given 2 (or more) behavioral 
measures, what is the probability that they are independently associated with 
brain size in my sample, controlling for phylogeny.

I'd also like to create a prediction model that allows me to estimate what the 
behavioral values would be for a given brain size (of course with confidence 
intervals, so I could assess whether the model is really actually useful at all 
for prediction).

Thanks for any suggestions,

-Tom
 
On Jul 11, 2013, at 5:23 PM, Liam J. Revell liam.rev...@umb.edu wrote:

 Hi Tom.
 
 This is actually not a property of GLS - but of using different correlation 
 structures when fitting y~x vs. x~y. When you set 
 correlation=corPagel(...,fixed=FALSE) (the default for corPagel), gls will 
 fit Pagel's lambda model to the residual error in y|x. The fitted value of 
 lambda will almost always be different between y|x and x|y. Since the fitted 
 correlation structure of the residual error is used to calculate our standard 
 error for beta, this will affect any hypothesis test about beta.
 
 By contrast, if we assume a fixed error structure (OLS, as in lm; or 
 correlation=corBrownian(...) - the latter being the same as contrasts 
 regression), we will find that the P values are the same for y~x vs. x~y.
 
 library(phytools)
 library(nlme)
 tree-pbtree(n=100)
 x-fastBM(tree)
 # note I have intentionally simulated y without phylogenetic signal
 y-setNames(rnorm(n=100),names(x))
 fit.a-gls(y~x,data.frame(x,y),correlation=corBrownian(1,tree))
 summary(fit.a)
 fit.b-gls(x~y,data.frame(x,y),correlation=corBrownian(1,tree))
 summary(fit.b)
 # fit.a  fit.b should have the same P-values
 fit.c-gls(y~x,data.frame(x,y),correlation=corPagel(1,tree))
 summary(fit.c)
 fit.d-gls(x~y,data.frame(x,y),correlation=corPagel(1,tree))
 summary(fit.d)
 # fit.c  fit.d will most likely have different P-values
 
 All the best, Liam
 
 Liam J. Revell, Assistant Professor of Biology
 University of Massachusetts Boston
 web: http://faculty.umb.edu/liam.revell/
 email: liam.rev...@umb.edu
 blog: http://blog.phytools.org
 
 On 7/11/2013 12:03 AM, Tom Schoenemann wrote:
 Hi all,
 
 I ran a PGLS with two variables, call them VarA and VarB, using a 
 phylogenetic tree and corPagel. When I try to predict VarA from VarB, I get 
 a significant coefficient for VarB.  However, if I invert this and try to 
 predict VarB from VarA, I do NOT get a significant coefficient for VarA. 
 Shouldn't these be both significant, or both insignificant (the actual 
 outputs and calls are pasted below)?
 
 If I do a simple lm for these, I get the same significance level for the 
 coefficients either way (i.e., lm(VarA ~ VarB) vs. lm(VarB ~ VarA), though 
 the values of the coefficients of course differ.
 
 Can someone help me understand why the PGLS would not necessarily be 
 symmetric in this same way?
 
 Thanks,
 
 -Tom
 
 outTree_group_by_brain_LambdaEst_redo1 - gls(log_group_size_data ~ 
 log_brain_weight_data, correlation = bm.t.100species_lamEst_redo1,data = 
 DF.brain.repertoire.group, method= ML)
 summary(outTree_group_by_brain_LambdaEst_redo1)
 Generalized least squares fit by maximum likelihood
   Model: log_group_size_data ~ log_brain_weight_data
   Data: DF.brain.repertoire.group
AIC BIClogLik
   89.45152 99.8722 -40.72576
 Correlation Structure: corPagel
  Formula: ~1
  Parameter estimate(s):
lambda
 0.7522738
 Coefficients:
Value Std.Error   t-value p-value
 (Intercept)   -0.0077276 0.2628264 -0.029402  0.9766
 log_brain_weight_data  0.4636859 0.1355499  3.420778  0.0009
 
  Correlation:
   (Intr)
 log_brain_weight_data -0.637
 Standardized residuals:
Min Q1Med Q3Max
 -1.7225003 -0.1696079  0.5753531  1.0705308  3.0685637
 Residual standard error: 0.5250319
 Degrees of freedom: 100 total; 98 residual
 
 
 Here is the inverse:
 
 outTree_brain_by_group_LambdaEst_redo1 - gls(log_brain_weight_data ~ 
 log_group_size_data, correlation = bm.t.100species_lamEst_redo1,data = 
 DF.brain.repertoire.group, method= ML)
 summary(outTree_brain_by_group_LambdaEst_redo1)
 Generalized least squares fit by maximum likelihood
   Model: log_brain_weight_data ~ log_group_size_data
   Data: DF.brain.repertoire.group
 AIC   BIC   logLik
   -39.45804 -29.03736 23.72902
 Correlation Structure: corPagel
  Formula: ~1
  Parameter estimate(s):
   lambda
 1.010277
 Coefficients:
  Value  Std.Error   t-value p-value
 (Intercept)  1.2244133 0.20948634  5.844836  0.
 log_group_size_data 

Re: [R-sig-phylo] Getting started making R packages

2013-07-12 Thread Eliot Miller
Tristan,

Check out the package devtools, and before submitting to CRAN make sure you
read: cran.us.r-project.org/doc/manuals/R-exts.pdf

There are obviously people on here who know a million times more than me,
and not that I've gotten very far, but I'm trying to figure this out myself
at the moment. Email me directly and I'll let you know what I find as I go.

Cheers,
Eliot


On Fri, Jul 12, 2013 at 1:19 PM, Tristan Stayton tstay...@bucknell.eduwrote:

 Hello,

 I'm relatively new to the world of R and need some general advice.  I have
 a number of R functions that I'd like to put together in a package and
 distribute via CRAN, and I've found a few more or less helpful tutorials on
 the subject.  However, I seem to be getting stuck with rtools (running R
 CMD check, etc...) and even if I get past that I'm sure other issues will
 come up.  So I'm wondering if there are any resources or tutorials
 available which you all would recommend.  What was helpful to you when you
 were first learning to build and distribute packages?

 Thanks in advance.  And feel free to let me know if this is not the proper
 venue for such questions.

 Tristan

 --
 C. Tristan Stayton
 Associate Professor of Biology
 Bucknell University
 Lewisburg, PA  17837

 Office:  570-577-3272

 [[alternative HTML version deleted]]

 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at
 http://www.mail-archive.com/r-sig-phylo@r-project.org/



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] PGLS vs lm

2013-07-12 Thread Joe Felsenstein

Tom Schoenemann asked me:

 With respect to your crankiness, is this the paper by Hansen that you are 
 referring to?:
 
 Bartoszek, K., Pienaar, J., Mostad, P., Andersson, S.,  Hansen, T. F. 
 (2012). A phylogenetic comparative method for studying multivariate 
 adaptation. Journal of Theoretical Biology, 314(0), 204-215.
 
 I wrote Bartoszek to see if I could get his R code to try the method 
 mentioned in there. If I can figure out how to apply it to my data, that will 
 be great. I agree that it is clearly a mistake to assume one variable is 
 responding evolutionarily only to the current value of the other (predictor 
 variables). 

I'm glad to hear that *somebody* here thinks it is a mistake (because it really 
is).  I keep mentioning it here, and Hansen has published extensively on it, 
but everyone keeps saying Well, my friend used it, and he got tenure, so it 
must be OK. 

The paper I saw was this one:

Hansen, Thomas F  Bartoszek, Krzysztof (2012). Interpreting the evolutionary 
regression: The interplay between observational and biological errors in 
phylogenetic comparative studies. Systematic Biology  61 (3): 413-425.  ISSN 
1063-5157.

J.F.

Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Getting started making R packages

2013-07-12 Thread David Bapst
Tristan-

I found this to be the most important how-to reference, but I am a windows user.

http://robjhyndman.com/hyndsight/building-r-packages-for-windows/

For distributing on CRAN, making help files and examples is the most
critical step. package.skeleton() is  helpful in setting up the
necessary directories and initial help documents for you, and then you
can go fill them out.

I found that looking at other people's package files on github or
elsewhere can be quite useful, such as when trying to figure out
formatting for stuff in the NAMESPACE file. For example:
https://github.com/richfitz/diversitree
http://www.phytools.org/
https://github.com/cboettig

Yes, R CMD check is where much of your time will be spent, but once
you are done with that, you will build the tarball for CRAN. You
should definitely run check with the --as-cran option, since it is
more stringent, and also preferably with R-devel.

A word of warning: CRAN administrators have fairly strict regulations
regarding how much time single help files or the entire package is
allowed to take, in terms of error checking with the help examples.
They may complain if your examples take too long, which may require
putting those in 'don't run' sections. When I submit, I am often
guilty of having examples that run too long.

There are other policies, and some suggestions, beyond the extensions
pdf which Elliot linked to. You should also look at these:
http://cran.r-project.org/web/packages/policies.html
http://developer.r-project.org/Rds.html

Good luck and let me know if you have any more specific questions,

-Dave

On Fri, Jul 12, 2013 at 1:19 PM, Tristan Stayton tstay...@bucknell.edu wrote:
 Hello,

 I'm relatively new to the world of R and need some general advice.  I have
 a number of R functions that I'd like to put together in a package and
 distribute via CRAN, and I've found a few more or less helpful tutorials on
 the subject.  However, I seem to be getting stuck with rtools (running R
 CMD check, etc...) and even if I get past that I'm sure other issues will
 come up.  So I'm wondering if there are any resources or tutorials
 available which you all would recommend.  What was helpful to you when you
 were first learning to build and distribute packages?

 Thanks in advance.  And feel free to let me know if this is not the proper
 venue for such questions.

 Tristan

 --
 C. Tristan Stayton
 Associate Professor of Biology
 Bucknell University
 Lewisburg, PA  17837

 Office:  570-577-3272

 [[alternative HTML version deleted]]

 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



-- 
David W. Bapst, PhD
Adjunct Asst. Professor, Geology and Geol. Eng.
South Dakota School of Mines and Technology
501 E. St. Joseph
Rapid City, SD 57701

http://home.uchicago.edu/~dwbapst/
http://cran.r-project.org/web/packages/paleotree/index.html

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/