Re: [R-sig-phylo] read.nexus error message

2012-12-07 Thread Emmanuel Paradis

Hi Simon,

I have tried with the first 2500 species and it worked fine. Are you 
sure to have the last version of ape?


The file output by this server is a NEXUS file, so use read.nexus, not 
read.tree.


Best,

Emmanuel

Simon Ducatez wrote on 07/12/2012 01:06:

Dear all,

I am having troubles importing a nexus file into R. The file contains 100 
different trees (containing over 2000 species) from a pseudo-posterior 
distribution (imported from the website http://birdtree.org/), and I got the 
same error message whether using read.nexus or read.tree:
Error in if (tp[3] != ) obj$node.label- tp[3] :   missing value where 
TRUE/FALSE needed
The same error was mentioned before on the web, but I couldn’t find any 
solution, and I do not understand the message. The different trees seem OK on 
Mesquite. Note that when using the same website to extract 100 trees, but this 
time for a sample of 50 species, the importation works without problem.
Any help would be greatly appreciated! Thank you very much in advance

Best
Simon

[[alternative HTML version deleted]]




___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


--
Emmanuel Paradis
IRD, Jakarta
Visiting Professor, Agricultural University of Bogor
http://ape.mpl.ird.fr/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] Null models and weighted abundance in picante MPD

2012-12-07 Thread John Larsson

Hi,

I'm using picante to calculate the MPD and MNTD of samples based on a 
bacterial 16S OTU phylogeny and a community data matrix. I have OTUs 
clustered at 97 and 99% identity and the community matrix contains total 
number of sequences for each OTU in each sample. I want to see how the 
diversity changes within the samples


I was wondering if the abundance weighted option in picante is at all 
applicable when calculating MPD/MNTD for this type of data. I see from 
the picante manual that using abundances changes the interpretation to 
the mean phylogenetic distances among individuals, but I don't 
understand exactly how the abundances are used for that. The underlying 
difference between sequences within an OTU cluster is hidden from 
picante so it feels to me that weighting by abundance will not give 
meaningful results in this case. If someone could clarify this I would 
really appreciate it.


Another thing I was wondering about are the different methods for 
creating the null community (randomization methods). The taxa.labels 
method randomizes the distance matrix, whereas all other methods 
randomize the community matrix in some way. I've been trying out the 
different methods and I get almost identical results with the 
taxa.labels, sample.pool, phylogeny.pool and richness methods but 
overall higher values with the frequency, independentswap and trialswap 
methods (goes for both the mpd.obs.z and mpd.obs.p values). However, the 
general pattern across samples is the same for all methods.


I find it difficult to distinguish between the different methods that 
randomize the community matrix and to know which one I should choose. 
It's somewhat comforting that the pattern across the samples doesn't 
change with the methods, but the values themselves change substantially. 
Any help at all on this, or pointers to further information would be great.


Thanks!

Cheers,
John

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Null models and weighted abundance in picante MPD

2012-12-07 Thread Steven Kembel
Hi John,

 I was wondering if the abundance weighted option in picante is at all 
 applicable when calculating MPD/MNTD for this type of data. I see from the 
 picante manual that using abundances changes the interpretation to the mean 
 phylogenetic distances among individuals, but I don't understand exactly how 
 the abundances are used for that. The underlying difference between sequences 
 within an OTU cluster is hidden from picante so it feels to me that weighting 
 by abundance will not give meaningful results in this case. If someone could 
 clarify this I would really appreciate it.

The interpretation of the abundance-weighted measures is that they are the 
expected phylogenetic distance among two randomly sampled individuals from a 
community, versus two randomly sampled species or OTUs from the community for 
the non-abundance-weighted metrics. For your data set it sounds like the 
interpretation of the abundance-weighted measures would be that if you drew two 
random sequences from a sample, what's the expected phylogenetic similarity of 
those sequences? If OTUs differ in abundance (they almost certainly do), that's 
a very different question from asking about drawing two randomly selected OTUs 
from the community and comparing their relatedness.

Picante assumes there is no variation within OTUs/species when calculating 
abundance-weighted MPD/MNTD (within-species phylogenetic distances are counted 
as zero distance). If you are using an OTU definition that encompasses a lot of 
intra-OTU variation this might not be a good assumption.

 I find it difficult to distinguish between the different methods that 
 randomize the community matrix and to know which one I should choose. It's 
 somewhat comforting that the pattern across the samples doesn't change with 
 the methods, but the values themselves change substantially. Any help at all 
 on this, or pointers to further information would be great.

There is a huge literature on using and choosing null models in ecology. Here 
are two papers that discuss null model choice in the context of studies of 
phylogenetic relatedness, with citations to more general papers on the subject:

Hardy, O. J. (2008), Testing the spatial phylogenetic structure of local 
communities: statistical performances of different null models and test 
statistics on a locally neutral community. Journal of Ecology, 96: 914–926.
Kembel, S. W. (2009). Disentangling niche and neutral influences on community 
assembly: assessing the performance of community phylogenetic structure tests. 
Ecology Letters, 12(9), 949–60.

-Steve
-- 
Steven W. Kembel
Professeur régulier
Département des sciences biologiques
Université du Québec à Montréal
kembel.steve...@uqam.ca
+1 (514) 987-3000 poste 5855 
http://www.phylodiversity.net/skembel/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Null models and weighted abundance in picante MPD

2012-12-07 Thread John Larsson
Hi Steve,

Thank you for the reply.
The level of OTU clustering is of course rather arbitrary when it comes 
to bacteria and it sounds like the abundance weighted MPD/MNTD is not a 
good idea in my case. I guess that one option would be to use 
phylogenies with all sequences, not just type-sequences of OTU clusters. 
Since I have two cutoffs I'm thinking it would be good to look further 
at OTU counts and diversity at theses different levels to get some quick 
sense of the variation within OTUs in different samples.

I'll read up on those references as well.

Again, thanks for your answer.

/john

On 12/07/2012 05:05 PM, Steven Kembel wrote:
 Hi John,

 I was wondering if the abundance weighted option in picante is at all 
 applicable when calculating MPD/MNTD for this type of data. I see from the 
 picante manual that using abundances changes the interpretation to the mean 
 phylogenetic distances among individuals, but I don't understand exactly 
 how the abundances are used for that. The underlying difference between 
 sequences within an OTU cluster is hidden from picante so it feels to me 
 that weighting by abundance will not give meaningful results in this case. 
 If someone could clarify this I would really appreciate it.
 The interpretation of the abundance-weighted measures is that they are the 
 expected phylogenetic distance among two randomly sampled individuals from 
 a community, versus two randomly sampled species or OTUs from the community 
 for the non-abundance-weighted metrics. For your data set it sounds like the 
 interpretation of the abundance-weighted measures would be that if you drew 
 two random sequences from a sample, what's the expected phylogenetic 
 similarity of those sequences? If OTUs differ in abundance (they almost 
 certainly do), that's a very different question from asking about drawing two 
 randomly selected OTUs from the community and comparing their relatedness.

 Picante assumes there is no variation within OTUs/species when calculating 
 abundance-weighted MPD/MNTD (within-species phylogenetic distances are 
 counted as zero distance). If you are using an OTU definition that 
 encompasses a lot of intra-OTU variation this might not be a good assumption.

 I find it difficult to distinguish between the different methods that 
 randomize the community matrix and to know which one I should choose. It's 
 somewhat comforting that the pattern across the samples doesn't change with 
 the methods, but the values themselves change substantially. Any help at all 
 on this, or pointers to further information would be great.
 There is a huge literature on using and choosing null models in ecology. Here 
 are two papers that discuss null model choice in the context of studies of 
 phylogenetic relatedness, with citations to more general papers on the 
 subject:

 Hardy, O. J. (2008), Testing the spatial phylogenetic structure of local 
 communities: statistical performances of different null models and test 
 statistics on a locally neutral community. Journal of Ecology, 96: 914–926.
 Kembel, S. W. (2009). Disentangling niche and neutral influences on community 
 assembly: assessing the performance of community phylogenetic structure 
 tests. Ecology Letters, 12(9), 949–60.

 -Steve


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[R-sig-phylo] issue with cophenetic.phylo()

2012-12-07 Thread Kelly Burkett
Hi Everyone,

I have been using the cophenetic.phylo() function and I have been
having some issues with pairs of tips having distances that are
different than what I would expect.

The following code illustrates the issue (I have set the seed in
order to reproduce these results). I generate a tree with
10 tips. There should be 9 unique non-zero distances in the matrix
returned by cophenetic.phylo(). Instead there are 10 because
1.22313194 is duplicated:

 set.seed(1723)
 tree=rcoal(10)
 x=cophenetic.phylo(tree)
 y=sort(unique(as.vector(x)))

y:
[1] 0. 0.04371057 0.16334724 0.22292128 0.31957449 0.33656861
[7] 0.35227827 0.84410552 1.22313194 1.22313194 2.14497976

As an example, t9 should have the same distance to t8 and t2.
x[t9,]:
   t1t3t6t9t7t2t8t5
0.3522783 0.3365686 0.3365686 0.000 1.2231319 1.2231319 1.2231319 2.1449798
  t10t4
2.1449798 2.1449798

But they are different:
 x[t9,t8]==x[t9,t2]
[1] FALSE

They are off by a very small amount:
 x[t9,t8]-x[t9,t2]
[1] -2.220446e-16

I am posting this in case it is of interest to others. As a quick fix,
I am now rounding x.

Thanks,
Kelly

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] issue with cophenetic.phylo()

2012-12-07 Thread Klaus Schliep
Hi Kelly,

where is the problem? This happens if you work with floating point numbers.
There is a nice link on the developer.r-project on the topic
www.validlab.com/goldberg/paper.pdf

The value is on my machine actually the same as
.Machine$double.eps
[1] 2.220446e-16
which is the smallest positive floating-point number ‘x’ such that ‘1 + x != 1’.

Regards,
Klaus


On 12/7/12, Kelly Burkett kelly.burk...@mail.mcgill.ca wrote:
 Hi Everyone,

 I have been using the cophenetic.phylo() function and I have been
 having some issues with pairs of tips having distances that are
 different than what I would expect.

 The following code illustrates the issue (I have set the seed in
 order to reproduce these results). I generate a tree with
 10 tips. There should be 9 unique non-zero distances in the matrix
 returned by cophenetic.phylo(). Instead there are 10 because
 1.22313194 is duplicated:

 set.seed(1723)
 tree=rcoal(10)
 x=cophenetic.phylo(tree)
 y=sort(unique(as.vector(x)))

 y:
 [1] 0. 0.04371057 0.16334724 0.22292128 0.31957449 0.33656861
 [7] 0.35227827 0.84410552 1.22313194 1.22313194 2.14497976

 As an example, t9 should have the same distance to t8 and t2.
 x[t9,]:
t1t3t6t9t7t2t8
 t5
 0.3522783 0.3365686 0.3365686 0.000 1.2231319 1.2231319 1.2231319
 2.1449798
   t10t4
 2.1449798 2.1449798

 But they are different:
 x[t9,t8]==x[t9,t2]
 [1] FALSE

 They are off by a very small amount:
 x[t9,t8]-x[t9,t2]
 [1] -2.220446e-16

 I am posting this in case it is of interest to others. As a quick fix,
 I am now rounding x.

 Thanks,
 Kelly

   [[alternative HTML version deleted]]

 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at
 http://www.mail-archive.com/r-sig-phylo@r-project.org/



-- 
Klaus Schliep
Phylogenomics Lab at the University of Vigo, Spain

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] Analysis with Multiple cores on Mac Workstation

2012-12-07 Thread José Hidasi
Dear all,

I want to create a consensus tree with branch lengths (Brian O'Meara's
function on post *[R-sig-phylo] Why no branch lengths on consensus
trees?*) using
a Mac Workstation. However, if I only type the function on it, R will not
use all cores for running the analysis. I would like to know if there is a
function or any way to divide the analysis within the cores, or to use all
cores for running the program.

I know this is not the best forum to ask something like that (using
multiple cores), but I imagined that someone might have the solution for
that as some of you work with large databases.

Best,
José Hidasi



*
This is Brian O'Meara's function, that i am using to create the consensus
tree, on the post *[R-sig-phylo] Why no branch lengths on consensus
trees?:*

I have a function to create a consensus tree with branch lengths. You
feed it a given topology (often a consensus topology, made with ape), then
a list of trees, and tell it what you want the branch lengths to
represent. It could be the proportion of input trees with that edge (good
for summarizing bootstrap or Bayes proportions) or the mean, median, or sd
of branch lengths for those trees that have that edge. Consensus branch
lengths in units of proportion of matching trees has obvious utility.
As Daniel says, the average branch lengths across a set of trees is
more difficult to see a use case for, but you could imagine doing something
like taking the ratogram output from r8s on a set of trees and summarizing
the rate average and rate sd on a given, best, tree as two sets of branch
lengths on that tree.

I've put the function source at
https://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/R/consensusBrlen.R?revision=110root=omearalab
.
 You can source the file for the function (consensusBrlen() ) and
other
functions it needs. It also uses phylobase. Note that this is alpha-quality
code -- it's been checked a bit, but verify it's doing what you want.

Here's an example of how to use it

 library(ape)

library(phylobase)

phy.a-rcoal(15)

phy.b-phy.a

phy.b$edge.length-phy.b$edge.length+runif(length(phy.b$edge.length), 0,
0.1)

phy.c-rcoal(15)

phy.list-list(phy.a, phy.b, phy.c)

phy.consensus-consensusBrlen(phy.a, list(phy.a, phy.b, phy.c),
type=mean_brlen)

-- 
José Hidasi Neto
Graduated in Biological Sciences - Universidade Federal de Goiás (UFG)
Master's candidate in Ecology and Evolution - Community Ecology and
Functioning Lab - UFG
Lattes:
http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K4293841A0

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[R-sig-phylo] interpreting phylogenetic signal

2012-12-07 Thread E Pearson
I calculated Blomberg's K using 'Kcalc' in 'picante' for a continuous trait
and a phylogeny of 65 species.


The K value is 1.079 and when I do the randomization test I get the
following results. I'm a bit confused by such a large PIC.variance.rnd.mean
= 8.031.


Should I be drawing conclusions by P = 0.0009 instead? Rejecting the NULL
of no phylogenetic signal?


# K PIC.variance.obs PIC.variance.rnd.mean PIC.variance.P PIC.variance.Z

# 1.079533 1.274221  8.0318440.000999001
-3.351957

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/