from:"Emmanuel Paradis"

Re: [R-sig-phylo] Extracting only FASTA sequences from a FASTQ file

2024-03-11 Thread Emmanuel Paradis

Hi Jarrett,

First, there are a lot of tools in the bioinformatics community that can 
convert FATSQ files to FASTA far more efficiently than ape does (I think 
cutadapt can do that, but surely many others too). But with a small file like 
this one, ape can do the job :)

See below for specific answers.

- Le 12 Mar 24, à 10:34, Jarrett Phillips phillipsjarre...@gmail.com a 
écrit :
> Hello,
> 
> I have a FASTQ file from which I would like to extract only the FASTA
> sequences (and not the associated PHRED quality scores).
> 
> For instance, using the file mentioned in the R documentation for
> read.fastq():
> 
> library(ape)
> 
> a <- "
> https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/sequence_read/
> "
> 
> file <- "SRR062641.filt.fastq.gz"
> URL <- paste0(a, file)
> download.file(URL, file)
> 
> fastq <- read.fastq(file) # import FASTQ file
> 
> One can easily extract the quality scores via
> 
> qual <- attr(fastq, "QUAL")
> 
> but there doesn't appear to be an attribute associated with the DNA
> sequences themselves (without the scores appended); only a "class"
> attribute, which evaluates to "DNAbin", is available.

The sequences are in the object; try the command:

base.freq(fastq, TRUE, TRUE)

> I've examined the underlying code for read.fastq() and there is a variable
> called DNA that is created, but it's not what I need:
> 
> function (file, offset = -33)
> {
>Z <- scan(file, "", sep = "\n", quiet = TRUE)
>tmp <- Z[c(TRUE, TRUE, FALSE, FALSE)]
>sel <- c(TRUE, FALSE)
>tmp[sel] <- gsub("^@", ">", tmp[sel])
>fl <- tempfile()
>cat(tmp, file = fl, sep = "\n")
>DNA <- read.FASTA(fl)
>tmp <- Z[c(FALSE, FALSE, FALSE, TRUE)]
>QUAL <- lapply(tmp, function(x) as.integer(charToRaw(x)))
>if (offset)
>QUAL <- lapply(QUAL, "+", offset)
>names(QUAL) <- names(DNA)
>attr(DNA, "QUAL") <- QUAL
>DNA
> }
> 
> I then tried the following:
> 
> x <- as.character.DNAbin(fastq) # extract DNA sequences
> y <- as.alignment(x)
> 
> seqs <- y$seq

If you want to do these extra steps, you can write 'y' into a FASTA file with 
(of course you can adapt the file name):

fl <- "titi.fas"
## avoid many copies of the data if you repeat the next command:
if (file.exists(fl)) unlink(fl)
for (i in 1:y$nb)
cat(">", y$nam[i], "\n", y$seq[i], "\n", sep = "",
file = fl, append = TRUE)

There is also the function seqinr::write.fasta() to de the same (a bit less 
efficiently I think).

> but 'seqs' can't be written to a file using write.FASTA().

No, instead write directly the 'fastq' object:

write.FASTA(fastq, "toto.fas")

And you can check that both files have the same sequences:

identical(read.FASTA("toto.fas"), read.FASTA("titi.fas"))

For the record, the first file has the sequences in uppercase, the 2nd one in 
lowercase.

Cheers,

Emmanuel

> Any assistance is greatly appreciated.
> 
> Thanks!
> 
> Cheers,
> 
> Jarrett
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] chronos ape substitution rate strict clock

2024-03-07 Thread Emmanuel Paradis

>> >>
>> >> Dear Emmanuel,
>> >>
>> >> Thank you very much for your response. I cannot see how to provide the
>> >> substitution rate to the phangorn::pml_bb() function, but I was looking
>> at
>> >> the ape::node.dating() function and it appears that I could provide the
>> >> substitution rate to the "mu" argument and then set the "node.dates"
>> >> argument to NA or zero for all tips (I'm not sure if NA or zero would be
>> >> preferable to force the tips to all be from a single time point). Do you
>> >> think that would work? I'm not sure how to make ape::node.dating()
>> accept a
>> >> substitution rate rather than try to estimate one. Maybe an option
>> could be
>> >> added to allow mu to equal a user-specified number rather than the
>> output
>> >> of ape::estimate.mu()?
>> >>
>> >> Another option might be to calculate an estimated age for every node
>> >> connecting sister taxa in the tree by converting the genetic distances
>> >> between sister pairs to divergence times using the substitution rate and
>> >> then use those as priors in ape::chronos(). I suppose I could also apply
>> >> that logic to date all of the nodes by using the mean pairwise distances
>> >> between taxa on either side of a node and converting that to divergence
>> >> times (although the R code for such a calculation would likely take me a
>> >> while to figure out). Would that be another option?
>> >>
>> >> Thanks again,
>> >>
>> >> Vincenzo
>> >>
>> >> On Sat, Mar 2, 2024 at 5:44 AM Emmanuel Paradis <
>> emmanuel.para...@ird.fr> 
>> >> wrote:
>> >>
>> >>
>> >> Hi Vincenzo,
>> >>
>> >> There's no direct way to do this with ape::chronos(). You may have a
>> look
>> >> at the function phangorn::pml_bb() but I'm not sure it can estimate the
>> >> dates if the rate is provided in a model object given as main
>> argument(?)
>> >>
>> >> That said, I expect that estimating so many dates to be very challenging
>> >> (unless you have a lot of known dates for calibration). This implies
>> that
>> >> you are certainly right to look for an approach where you don't need to
>> >> estimate the rates.
>> >>
>> >> Best,
>> >>
>> >> Emmanuel
>> >>
>> >> - Le 27 Fév 24, à 22:40, Vincenzo Ellis vael...@udel.edu a écrit :
>> >>
>> >>
>> >> Dear R-sig-phylo members,
>> >>
>> >> I've made a maximum likelihood tree in Raxml for several thousand taxa
>> >> using a single gene that has an estimated substitution rate of 0.006
>> >> substitutions/nucleotide/My. Is there a way to use chronos in ape to
>> >>
>> >> apply
>> >>
>> >> that substitution rate as a fixed clock rate and generate an ultrametric
>> >> time-scaled version of the tree?
>> >>
>> >> Thank you,
>> >>
>> >> Vincenzo
>> >>
>> >>  [[alternative HTML version deleted]]
>> >>
>> >> ___
>> >> R-sig-phylo mailing list - R-sig-phylo@r-project.orghttps://
>> nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylo=05%7C02%7Cliam.revell%40umb.edu%7C94e522b834f94ff6aa7c08dc3c659399%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C638451654193183565%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=wV2IJEHNPpbc2a7Erg%2F14jVmDcIqLshVohWaKa6vG6U%3D=0
>> <https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
>> >> Searchable archive at
>> >>
>> >>
>> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2F=05%7C02%7Cliam.revell%40umb.edu%7C94e522b834f94ff6aa7c08dc3c659399%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C638451654193196126%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C0%7C%7C%7C=d4fLpFetiQDBjHKTmdlDijgx%2F360Z2oPddHLdCqrrUQ%3D=0
>> <http://www.mail-archive.com/r-sig-phylo@r-project.org/>
>> >>
>> >>[[alternative HTML version deleted]]
>> >>
>> >> ___
>> >> R-sig-phylo mailing list - R-s

Re: [R-sig-phylo] chronos ape substitution rate strict clock

2024-03-07 Thread Emmanuel Paradis

Hi Vincenzo, 

"node.dating" is a help topic that points to two functions which seem to do 
what you are looking for. As a simple example: 

R> n <- 3 
R> tr <- rcoal(n) 
R> estimate.dates(tr, rep(0, n), .01) 
[1] 0.0 0.0 0.0 -144.32183 -43.67435 
R> estimate.dates(tr, rep(0, n), .1) 
[1] 0.00 0.00 0.00 -14.432183 -4.367435 

By setting the dates all equal to zero this assumes an ultrametric tree. Note 
that the input tree can be non-ultrametric: 

R> tr <- rtree(n) 
R> estimate.dates(tr, rep(0, n), .1) 
[1] 0.0 0.0 0.0 -12.36168 -6.29650 
R> estimate.dates(tr, rep(0, n), .01) 
[1] 0. 0. 0. -123.6168 -62.9650 

The third argument is the given (fixed) value of mu. 

Another function that you may check is chronoMPL(): it is based on a simple 
method (the reference is in the help page) which could be useful in your case. 

Best, 

Emmanuel 

- Le 4 Mar 24, à 23:09, Vincenzo Ellis  a écrit : 

> Dear Emmanuel,

> Thank you very much for your response. I cannot see how to provide the
> substitution rate to the phangorn::pml_bb() function, but I was looking at the
> ape::node.dating() function and it appears that I could provide the
> substitution rate to the "mu" argument and then set the "node.dates" argument
> to NA or zero for all tips (I'm not sure if NA or zero would be preferable to
> force the tips to all be from a single time point). Do you think that would
> work? I'm not sure how to make ape::node.dating() accept a substitution rate
> rather than try to estimate one. Maybe an option could be added to allow mu to
> equal a user-specified number rather than the output of ape:: [
> http://estimate.mu/ | estimate.mu ] ()?

> Another option might be to calculate an estimated age for every node 
> connecting
> sister taxa in the tree by converting the genetic distances between sister
> pairs to divergence times using the substitution rate and then use those as
> priors in ape::chronos(). I suppose I could also apply that logic to date all
> of the nodes by using the mean pairwise distances between taxa on either side
> of a node and converting that to divergence times (although the R code for 
> such
> a calculation would likely take me a while to figure out). Would that be
> another option?

> Thanks again,

> Vincenzo

> On Sat, Mar 2, 2024 at 5:44 AM Emmanuel Paradis < [
> mailto:emmanuel.para...@ird.fr | emmanuel.para...@ird.fr ] > wrote:

>> Hi Vincenzo,

>> There's no direct way to do this with ape::chronos(). You may have a look at 
>> the
>> function phangorn::pml_bb() but I'm not sure it can estimate the dates if the
>> rate is provided in a model object given as main argument(?)

>> That said, I expect that estimating so many dates to be very challenging 
>> (unless
>> you have a lot of known dates for calibration). This implies that you are
>> certainly right to look for an approach where you don't need to estimate the
>> rates.

>> Best,

>> Emmanuel

>> - Le 27 Fév 24, à 22:40, Vincenzo Ellis [ mailto:vael...@udel.edu |
>> vael...@udel.edu ] a écrit :

>> > Dear R-sig-phylo members,

>> > I've made a maximum likelihood tree in Raxml for several thousand taxa
>> > using a single gene that has an estimated substitution rate of 0.006
>> > substitutions/nucleotide/My. Is there a way to use chronos in ape to apply
>> > that substitution rate as a fixed clock rate and generate an ultrametric
>> > time-scaled version of the tree?

>> > Thank you,

>> > Vincenzo

>> > [[alternative HTML version deleted]]

>> > ___
>>> R-sig-phylo mailing list - [ mailto:R-sig-phylo@r-project.org |
>> > R-sig-phylo@r-project.org ]
>>> [ https://stat.ethz.ch/mailman/listinfo/r-sig-phylo |
>> > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ]
>>> Searchable archive at [ 
>>> http://www.mail-archive.com/r-sig-phylo@r-project.org/ |
>> > http://www.mail-archive.com/r-sig-phylo@r-project.org/ ]

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] chronos ape substitution rate strict clock

2024-03-02 Thread Emmanuel Paradis

Hi Vincenzo,

There's no direct way to do this with ape::chronos(). You may have a look at 
the function phangorn::pml_bb() but I'm not sure it can estimate the dates if 
the rate is provided in a model object given as main argument(?)

That said, I expect that estimating so many dates to be very challenging 
(unless you have a lot of known dates for calibration). This implies that you 
are certainly right to look for an approach where you don't need to estimate 
the rates.

Best,

Emmanuel

- Le 27 Fév 24, à 22:40, Vincenzo Ellis vael...@udel.edu a écrit :

> Dear R-sig-phylo members,
> 
> I've made a maximum likelihood tree in Raxml for several thousand taxa
> using a single gene that has an estimated substitution rate of 0.006
> substitutions/nucleotide/My. Is there a way to use chronos in ape to apply
> that substitution rate as a fixed clock rate and generate an ultrametric
> time-scaled version of the tree?
> 
> Thank you,
> 
> Vincenzo
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Highlighting Specific Clades in a Lineage-Through-Time Plot

2023-10-26 Thread Emmanuel Paradis

Hi Russell, 

There are several implementations of LTT plots among several packages, so the 
details certainly differ depending on which one(s) you use. 

Maybe you can use the ape function ltt.plot.coords() which returns a matrix 
with 2 columns giving the number of lineages for each node time of the tree. I 
guess you'll need to do a bit of programming to combine the different outputs 
from different trees. 

Best, 

Emmanuel 

- Le 27 Oct 23, à 8:05, Russell Engelman  a écrit 
: 

> Dear R-Sig-Phylo,
> I was wondering if there was any way to color-code a lineage-through-time 
> plot,
> to highlight the proportion of taxa at specific intervals that belong to a
> particular clade. I.e., an LTT plot of tetrapod diversity through time, and I
> want to highlight the number of lineages at any one point in time that are
> chondrichthyans, sarcopterygians, lissamphibians, etc. (see attached fig. as 
> an
> example). I figured I can do this by asking R to return all lineages that are
> descendants of a specific node, but am not sure what functions I can use to
> convert the dated tree into an object that can be read into ggplot.

> Sincerely,
> Russell

> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Correlation coefficient from a PGLS

2023-07-21 Thread Emmanuel Paradis

Hi Lior,

R-squared and Pearson's coefficient are two different things:

R-squared (aka coefficient of determination) quantifies the quantity of 
variance of a variable (the response, 'y' in your example) that is explained by 
the model where some predictors can be controlled by the experimenter. This 
concept is difficult to apply in evolutionary models since variables are 
(usually) not controlled.

The correlation coefficient quantifies the link between two uncontrolled 
variables without any assumption whether one is a response and the other is the 
predictor. In the case of phylogenetic comparative analyses, the correlation 
between PICs is certainly what you are looking for: see ?pic in ape.

HTH,

Best,

Emmanuel

- Le 18 Juil 23, à 19:29, Lior Glick liorg...@mail.tau.ac.il a écrit :

> Hello,
> 
> I ran a PGLS analysis like this:
> model = gls(y ~ x, correlation = corBrownian(phy = tree, form=~species),
> data = data_df, method = "ML")
> 
> I was able to extract the coefficients (intercept + b1 in this case), as
> well as the relevant p-values. What I couldn't figure out is how to get a
> correlation coefficient between y and x, something like R-squared or
> pearson's coefficient.
> 
> Can you please help with that?
> Thanks!
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Parallelization in ape::dist.topo

2023-03-07 Thread Emmanuel Paradis

To follow (tangently) on Klaus' message, I've released a book last year on some 
advanced topics in R programming and development:

https://hal.ird.fr/ird-03850685

Chapter 9 is on parallelization and HPC. There are a few (detailed) examples 
showing when multi-core is benefitial and when it is not. I've also written a 
list of rules on this topic that overlap with Klaus' advices.

Cheers,

Emmanuel

- Le 7 Mar 23, à 19:01, Klaus Schliep klaus.schl...@gmail.com a écrit :

> Dear Vojtěch,
> nice work. Just a few random comments:
> Parallelization is often not straightforward as it depends on the hardware
> and the operating system. My preference is using the future package for
> parallelization as it does some nice abstraction for the different R
> packages, so you can try different things and no need to write different
> versions of the code. chatGPT seems to have missed this.
> Defaulting to "cores = detectCores()" is not a good idea. The CRAN
> Repository Policy (https://cran.r-project.org/web/packages/policies.html)
> states: "If running a package uses multiple threads/cores it must never use
> more than two simultaneously: the check farm is a shared resource and will
> typically be running many checks simultaneously." In short such a default
> can cause serious problems on a cluster and I got properly ripleyed for
> this years ago, but that's another story. So the default should be
> something like min(2L, detectCores()). So with great power comes great
> responsibility and users should read the man pages.
> Kind regards,
> Klaus
> 
> 
> On Mon, Mar 6, 2023 at 2:43 PM Vojtěch Zeisek  wrote:
> 
>> Hello dear colleagues,
>> I use often ape::dist.topo (see here dist.topo.r), which is doing the
>> calculations sequentially, which is very slow for large data sets.
> 
> 
> What is large? Large number of trees or large trees?
> 
> I'm sorry,
>> I haven't found any relevant Git repository or so, so I hope Emmanuel
>> won't
>> mind if I discuss it here.
>> I discussed various options with ChatGPT and dist.topo.par1.r is the
>> simplest
>> solution, basically using mc.lapply instead of 2 for loops. Good study
>> material for how to do it in general. Little enhancements are in
>> dist.topo.par2.r, which should be slightly better in case some pair of
>> comparisons would return NA or so, but from my tests there doesn't seem to
>> be
>> any difference.
>>
> 
> I think there is room for improvement with preprocessing trees. If you have
> e.g. bootstrap trees with short edges you might want to use di2multi()
> first to avoid spurious differences. Filtering out duplicated trees could
> also speed up things, if there are many. This will depend on the trees you
> want to compare and the methods.
> 
> And finally there is dist.topo.par3.r which doesn't load parallel (and uses
>> plain lapply) for cores==1, while parallel and doParallel for multiple
>> cores.
>> It also contains some checks and error handling. From my testing it works
>> well. I'm not sure if tryCatch is really needed there.
> 
> 
> I am not sure if it is necessary, but you don't want to have it in the
> inner loop ;)
> 
> In any case,
>> improvements welcomed. :-)
>> So, what do You think? Is this usable improvement of ape::dist.topo?
>> Sincerely,
>> V.
>>
>> --
>> Vojtěch Zeisek
>> https://trapa.cz/en/
>>
>> Department of Botany, Faculty of Science
>> Charles University, Prague, Czech Republic
>> https://www.natur.cuni.cz/biology/botany/
>> https://lab-allience.natur.cuni.cz/
>>
>> Institute of Botany, Czech Academy of Sciences
>> Průhonice, Czech Republic
>> https://www.ibot.cas.cz/en/
>> Computing cluster
>> https://sorbus.ibot.cas.cz/en/start
>> ___
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at
>> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>>
> 
> 
> --
> Klaus Schliep
> 
> Senior Scientist
> Institute of Molecular Biotechnology
> TU Graz
> https://www.imbt.tugraz.at
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Parallelization in ape::dist.topo

2023-03-07 Thread Emmanuel Paradis

Hi Vojtěch,

The GH repos for ape is:

https://github.com/emmanuelparadis/ape

I had a quick look at your code and these are interesting improvements. It 
seems also possible to improve the basic code of dist.topo() (e.g., using 
bitsplits) so it is worth opening an issue.

Cheers,

Emmanuel

- Le 6 Mar 23, à 20:43, Vojtěch Zeisek vo...@trapa.cz a écrit :

> Hello dear colleagues,
> I use often ape::dist.topo (see here dist.topo.r), which is doing the
> calculations sequentially, which is very slow for large data sets. I'm sorry,
> I haven't found any relevant Git repository or so, so I hope Emmanuel won't
> mind if I discuss it here.
> I discussed various options with ChatGPT and dist.topo.par1.r is the simplest
> solution, basically using mc.lapply instead of 2 for loops. Good study
> material for how to do it in general. Little enhancements are in
> dist.topo.par2.r, which should be slightly better in case some pair of
> comparisons would return NA or so, but from my tests there doesn't seem to be
> any difference.
> And finally there is dist.topo.par3.r which doesn't load parallel (and uses
> plain lapply) for cores==1, while parallel and doParallel for multiple cores.
> It also contains some checks and error handling. From my testing it works
> well. I'm not sure if tryCatch is really needed there. In any case,
> improvements welcomed. :-)
> So, what do You think? Is this usable improvement of ape::dist.topo?
> Sincerely,
> V.
> 
> --
> Vojtěch Zeisek
> https://trapa.cz/en/
> 
> Department of Botany, Faculty of Science
> Charles University, Prague, Czech Republic
> https://www.natur.cuni.cz/biology/botany/
> https://lab-allience.natur.cuni.cz/
> 
> Institute of Botany, Czech Academy of Sciences
> Průhonice, Czech Republic
> https://www.ibot.cas.cz/en/
> Computing cluster
> https://sorbus.ibot.cas.cz/en/start
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] NAMESPACE issue

2023-02-28 Thread Emmanuel Paradis

Hi Kevin,

Do you run the checks of a new version of windex on this site:

win-builder.r-project.org/

? It runs the checks both on Debian and Windowns with several choices of R 
versions, and you can access the detailed log files.

Best,

Emmanuel

- Le 28 Fév 23, à 17:38, Arbuckle kevin.arbuc...@swansea.ac.uk a écrit :

> Hi all,
> 
> I realise this might not be strictly a phylogeny-related issue, but it might 
> be
> as it relates to geiger and in any case the CRAN Core Team have left me at an
> impasse trying to get an update to windex through to CRAN (just repeating the
> error message then ignoring messages asking for clarification or advice on
> fixing it). Hopefully it's relevant enough to this list to be OK.
> 
> When trying to submit an update to the package, all Debian checks are coming
> back fine, but there is one note in the Windows checks that is stopping the
> process. The note reads:-
> 
> "windex: no visible global function definition for 'rescale'
> Undefined global functions or variables:
>  rescale"
> 
> However, geiger (the package which contains rescale) is listed as a dependency
> in the DESCRIPTION and is imported in the NAMESPACE, so rescale should
> definitely be there. Combined with the lack of issue flagged by Debian checks
> (which presumably would also check basic things like 'do the functions exist')
> and the fact that this issue hasn't occurred in previous version submitted to
> CRAN (nothing has changed with respect to the use of the rescale function), I
> am at a loss.
> 
> In case it helps, the contents of the DESCRIPTION and NAMESPACE files are as
> follows:-
> 
> DESCRIPTION file:
> Package: windex
> Type: Package
> Title: Analysing Convergent Evolution using the Wheatsheaf Index
> Version: 2.0.4
> Date: 2023-02-23
> Author: Kevin Arbuckle and Amanda Minter
> Maintainer: Kevin Arbuckle
> kevin.arbuc...@swansea.ac.uk
> Description: Analysing convergent evolution using the Wheatsheaf index,
> described in Arbuckle et al. (2014) , and some
> other unrelated but perhaps useful functions.
> License: GPL-2
> Depends: phytools, geiger (>= 2.0), ape (>= 4.0), phangorn, scatterplot3d,
> utils, methods, R (>= 3.0.0)
> Packaged: 2023-02-23 12:36:50 UTC; karbuckle
> 
> NAMESPACE file:
> exportPattern("^[^\\.]")
> import(ape,utils,scatterplot3d)
> importFrom("graphics", "abline", "hist", "legend", "points", "arrows",
> "plot","text","polygon")
> importFrom("stats", "dist", "quantile", "sd", "var", "AIC", "BIC", "logLik",
> "na.omit", "pchisq", "shapiro.test","model.frame","density")
> importFrom("methods", "is")
> importFrom("phytools", "findMRCA", "nodeHeights")
> importFrom("geiger", "rescale", "ratematrix", "sim.char")
> importFrom("phangorn", "CI", "as.phyDat", "Children", "Descendants")
> 
> 
> Any advice on how to resolve this would be welcome. Note that originally I had
> imported geiger (and phangorn) as import rather than importFrom in the
> NAMESPACE, but the current version was an attempt to resolve the issue and
> didn't work in that the same note is coming back every time.
> 
> Best wishes,
> Kev
> 
> Dr Kevin Arbuckle
> Athro Cyswllt (Darllenydd) mewn Biowyddorau (Bioleg Esblygiadol) / Associate
> Professor (Reader) in Biosciences (Evolutionary Biology)
> Adran Biowyddorau / Department of Biosciences
> Cyfadran Gwyddoniaeth a Pheirianneg / Faculty of Science and Engineering
> Prifysgol Abertawe  / Swansea University
> Abertawe / Swansea
> SA2 8PP
> UK
> 
> Rhowch wybod i ni os hoffech dderbyn eich gohebiaeth yn Gymraeg. Rydym yn
> croesawu gohebiaeth yn Gymraeg neu yn Saesneg. Ni fydd gohebu yn Gymraeg yn
> arwain at oedi.
> 
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Extracting intraspecific and interspecific distances from dist.dna() object

2022-11-01 Thread Emmanuel Paradis

Hi Jarrett, 
(Cc to the list since I'm correcting some errors) 

- Le 28 Oct 22, à 3:01, Jarrett Phillips  a 
écrit : 

> Thanks Emmanuel,
> Sorry for the late reply.

> If I understand correctly, there are two options to get the intraspecific and
> interspecific distances from the pairwise distance matrix:

> library(ape)
> library(spider)

> data(anoteropsis) # read in DNA sequences

> anoSpp <- sapply(strsplit(dimnames(anoteropsis)[[1]], split="_"),
> function(x) paste(x[1], x[2], sep="_")) # get specimen labels

Or more simply (see ?stripLabel for the options): 

anoSpp <- stripLabel(labels(anoteropsis)) 

> anoDist <- dist.dna(anoteropsis, model = "raw", pairwise.deletion = FALSE,
> as.matrix = TRUE) # get distance matrix

> x1 <- lower.tri(outer(anoSpp, anoSpp, "=="), diag = FALSE) # I specified lower
> triangular as per your suggestion
> intra1 <- anoDist[x1] # intraspecific distances
> inter1 <- anoDist[!x1] # interspecific distances

This should be: 

x1 <- outer(anoSpp, anoSpp, "==") 
x1 <- x1 & lower.tri(x1) 
intra1 <- anoDist[x1] # intraspecific distances with no duplicate 

When trying the 2nd solution, I found out that as.dist() on a logical matrix 
converts the values as numeric (TRUE/FALSE become 1/0), so you might do 
instead: 

d <- dist.dna(anoteropsis, model = "raw") 
intra1BIS <- d[which(as.dist(outer(anoSpp, anoSpp, "==")) == 1)] 

Checking: 

R> identical(intra1, intra1BIS) 
[1] TRUE 

A simpler solution is: 

x1 <- outer(anoSpp, anoSpp, "==") 
x1 <- x1[lower.tri(x1)] 
intra1TER <- d[x1] 

R> identical(intra1, intra1TER) 
[1] TRUE 

Since 'd' has only the lower triangle of the distance matrix, d[!x1] gives the 
inter-species distances. The alternative using the full (square) matrix would 
be: 

x2 <- outer(anoSpp, anoSpp, "!=") 
x2 <- x2 & lower.tri(x2) 
anoDist[x2] 

Checking: 

R> identical(d[!x1], anoDist[x2]) 
[1] TRUE 

Note that if you have a very big data set, you may call outer() only once with 
"==" and then do x2 <- !x1; and of course, lower.tri(x1) and lower.tri(x2) are 
identical. 

> x2 <- as.dist(outer(anoSpp, anoSpp, "=="), diag = FALSE, upper = FALSE) # 
> lower
> triangular matrix excluding main diagonal
> intra2 <- anoDist[x2]
> inter2 <- anoDist[!x2]

> However, these implementations do not give the same results.

> Am I missing something? Could you clarify?

> As a side note, the 'spider' R package has the function 'sppDist' that returns
> intraspecific and interspecific distances. However, it differs as well, so
> there is some added confusion.

> x3 <- sppDist(anoDist, anoSpp)
> intra <- x3$intra
> inter <- x4$inter

This gives me: 

R> identical(x3$intra, intra1) 
[1] TRUE 

Be careful that the distances in x3$inter are duplicated: 

R> lengths(x3) 
inter intra 
1024 16 

Best, 

Emmanuel 

> Which implementation is the correct one?

> Cheers.

> Jarrett

> On Tue, Oct 18, 2022 at 11:51 PM Emmanuel Paradis < [
> mailto:emmanuel.para...@ird.fr | emmanuel.para...@ird.fr ] > wrote:

>> Hi Jarrett,

>> If you have a vector of species names (say 'taxa'), then:

>> o <- outer(taxa, taxa, "==")

>> returns a logical matrix with TRUE when the two strings in 'taxa' are the 
>> same,
>> FALSE if they are different (so the diagonal is TRUE). Since you return the
>> output of dist.dna() as a matrix, then you can do:

>> diag(o) <- FALSE
>> anoDist[o] # all the intraspecific distances
>> anoDist[!o] # all the interspecific distances

>> You may have also to remove the duplicated distances, maybe with 
>> lower.tri(). An
>> alternative is to work with the "dist" object (this will use less memory too)
>> from dist.dna:

>> o <- as.dist(outer(taxa, taxa, "=="))

>> will drop the upper triangle and the diagonal of the logical matrix. The two
>> above commands are the same.

>> After, you can elaborate on this by creating your function(s) if you want to
>> focus on a particular species (or pair of), eg:

>> f <- function(x, y) x == "Homo sapiens" & y == "Homo abilis"
>> o <- as.dist(outer(taxa, taxa, f))

>> See also the function adegeneet::pairDistPlot() which a method for "DNAbin"
>> objects.

>> HTH

>> Best,

>> Emmanuel

>> - Le 19 Oct 22, à 1:16, Jarrett Phillips [ 
>> mailto:phillipsjarre...@gmail.com
>> | phillipsjarre...@gmail.com ] a écrit :

>> > Hello,

>> > I'm wondering whether someone knows of an efficient way to extract
>>

Re: [R-sig-phylo] Extracting intraspecific and interspecific distances from dist.dna() object

2022-10-18 Thread Emmanuel Paradis

Hi Jarrett,

If you have a vector of species names (say 'taxa'), then:

o <- outer(taxa, taxa, "==")

returns a logical matrix with TRUE when the two strings in 'taxa' are the same, 
FALSE if they are different (so the diagonal is TRUE). Since you return the 
output of dist.dna() as a matrix, then you can do:

diag(o) <- FALSE
anoDist[o] # all the intraspecific distances
anoDist[!o] # all the interspecific distances

You may have also to remove the duplicated distances, maybe with lower.tri(). 
An alternative is to work with the "dist" object (this will use less memory 
too) from dist.dna:

o <- as.dist(outer(taxa, taxa, "=="))

will drop the upper triangle and the diagonal of the logical matrix. The two 
above commands are the same.

After, you can elaborate on this by creating your function(s) if you want to 
focus on a particular species (or pair of), eg:

f <- function(x, y) x == "Homo sapiens" & y == "Homo abilis"
o <- as.dist(outer(taxa, taxa, f))

See also the function adegeneet::pairDistPlot() which a method for "DNAbin" 
objects.

HTH

Best,

Emmanuel

- Le 19 Oct 22, à 1:16, Jarrett Phillips phillipsjarre...@gmail.com a écrit 
:

> Hello,
> 
> I'm wondering whether someone knows of an efficient way to extract
> intraspecific (within-species) and interspecific (between-species) genetic
> distances for each species returned by ape::dist.dna() to be able to
> calculate the DNA barcode gap.
> 
> It seems extracting the entire object as a matrix is a step in the right
> direction.
> 
> Consider the following example:
> 
>library(ape)
>library(spider)
> 
>data(anoteropsis) # sample dataset
>anoDist <- dist.dna(anoteropsis, model = "raw", pairwise.deletion =
> FALSE, as.matrix = TRUE) # compute p-distances
> 
> Any ideas?
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[R-sig-phylo] [2nd announcement] "Bioinformatics: The Next 20 Years" (Oct 27)

2022-09-28 Thread Emmanuel Paradis


Dear all,

Registration is now open:

https://meeting-nstda.webex.com/meeting-nstda/j.php?RGID=r49da897c6a68260a5bf2bdf41763bcba

Cheers,

Emmanuel

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[R-sig-phylo] wrong behaviour of is.monophyletic() with duplicated labels

2022-02-20 Thread Emmanuel Paradis


(Posted to: r-sig-phylo and GitHub)

Hi all,

is.monophyletic() does not work correctly when some tip labels are 
duplicated. Here's a simple example:


R> tr <- read.tree(text = "((A,B),(A,C));")

The two tips labelled "A" are not sister-lineages, but the current 
version of ape gives:


R> is.monophyletic(tr, "A")
[1] TRUE
R> is.monophyletic(tr, c("A", "A")) # <- must interrupt
^C

The reason for this is that the function looks for the first occurrence 
of "A" in the tip labels of 'tr', even with duplicated labels. By 
contrast, drop.tip() looks for all occurrences; the following command 
drops 2 tips even if a single label is given:


R> drop.tip(tr, "A")

Phylogenetic tree with 2 tips and 1 internal nodes.

Tip labels:
  B, C

Rooted; no branch lengths.

The version of ape on GH (will be submitted to CRAN next week) now 
returns FALSE in both above commands. I assume this change will create 
no problem for the packages that use this function, but I thought it may 
be worth explaining this issue ahead of the CRAN submission. As usual, 
any feedback welcome.


Best,

Emmanuel

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Substitution model rate matrix - how to read it?

2021-11-27 Thread Emmanuel Paradis

Hi Martin,

Rates are generally relative quantities in phylogenetics. You cannot estimate 
the substitution rate(s) together with the branch lengths of the tree, but 
rates can be estimated relative to each others. That's why one rate is fixed to 
1 in the GTR model.

In the JC69 model the (single) rate cannot be estimated, unless you fix the 
branch lengths, which can be done in phangorn::optim.pml with the options 
optRate=TRUE, optEdge=FALSE (if you set both options to TRUE, you'll get a 
warning). If you do this, the output will include an element '...$rate' with 
the estimated substitution rate.

The rate matrix can have elements larger than 1, and it's its sums by row which 
are equal to 0. Generally in phylogenetics, there is no time information, so 
the branch lengths are interpreted in expected numbers of substitutions along 
each branch. You can still do the matrix exponential (e^(Q*t)) but the 
resulting probabilities cannot be interpreted easily.

Best,

Emmanuel

- Le 26 Nov 21, à 18:04, Martin Fikáček mfika...@gmail.com a écrit :

> Hi everybody,
> 
> I am now trying to explain the principles of phylogenetics to the students
> using R and went into a very simple problem that I cannot solve. Probably a
> very simple and basic thing, sorry for a stupid questions:
> 
> When checking the details of models selected for my data by modelTest() in
> phangorn, the rate matrix always includes number around 1 or even mich
> higher (for example this is the matrix for Laurasiatherian data with
> GTR+I+G model:
> 
> Rate matrix:
>  a  c  g t
> a  0.00  3.0009884 11.8735854  2.608831
> c  3.000988  0.000  0.5162325 21.771813
> g 11.873585  0.5162325  0.000  1.00
> t  2.608831 21.7718125  1.000  0.00
> 
> For some simple models it gives just 0 or 1 as for example this for JC:
> 
> Rate matrix:
>  a c g t
> a 0 1 1 1
> c 1 0 1 1
> g 1 1 0 1
> t 1 1 1 0
> 
> I would normally expect the rate matric to have values lower than 1, and to
> sum up to 0. Then it would make sense to use it also for calculating the
> probability matrix using e^(Q*t). I wanted to illustrate the meaning of the
> rate matrix estimated for real data to the students in this way, which is
> why I realized that the output by phangorn is different and I fail to find
> out why.
> 
> Thanks for any hint!
> 
> Martin
> 
> --
> *Martin Fikáček (費卡契) MSc. PhD.*
> *Department of Biological Sciences*
> *National Sun Yat-sen University*
> *No. 70, Lienhai Rd., Kaohsiung 80424, Taiwan*
> *E-mail: *mfika...@gmail.com, mfika...@mail.nsysu.edu.tw
> *Phone: *(+886) 75252000 # 3622
> *Website: *www.cercyon.eu
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Trimming protein alignment

2021-11-02 Thread Emmanuel Paradis

Ha! That's because gaps are coded with dashes in your files; I assumed it was 
X's (as returned by ape::trans). So this line 7 should be:

foo <- function(x) sum(x == 0x58 | x == 0x2d)

Or (probably easier to modify):

foo <- function(x) sum(x == charToRaw("X") | x == charToRaw("-"))

Cheers,
Emmanuel

- Le 2 Nov 21, à 19:23, Vojtěch Zeisek vo...@trapa.cz a écrit :

> Thank You, Emmanuel,
> I thought I must do something with line 7, but I had no idea what. :-)
> Unfortunately it didn't work. I still get "All sequences of the same length:
> 249" after running the edited function as 'aln.ng <- del.colgapsonly(x=aln,
> threshold=0.25, freq.only=FALSE)'... :-( I'm attaching one example (I have
> almost 1000 of them, so I'd like to avoid to edit all of them manually:-).
> Sincerely,
> V.
> 
> Dne úterý 2. listopadu 2021 12:36:53 CET jste napsal(a):
>> Hi Vojtěch,
>> In addition to removing lines 3 and 4, replace line 7:
>> foo <- function(x) sum(x == 4)
>> by:
>> foo <- function(x) sum(x == 0x58)
>> That sh(c)ould do it.
>> Best,
>> Emmanuel
>> 
>> Le 2 Nov 21, à 17:37, Vojtěch Zeisek vo...@trapa.cz a écrit :
>> > Hello,
>> > I try to trim protein alignments in R. I see del.colgapsonly() and
>> > del.rowgapsonly() from ape can trim only class DNAbin. As DNAbin as
>> > basically same matrix as AAbin (Isn't it?), I thought it should be
>> > easy. I commented out lines 3 and 4 in both del.colgapsonly() and
>> > del.rowgapsonly(), but it didn't lead to success. I haven't found
>> > any other way how to trim protein alignment in R similar to what
>> > these two function do. Is there any way?
>> > Sincerely,
>> > V.
> --
> Vojtěch Zeisek
> https://trapa.cz/en/
> 
> Department of Botany, Faculty of Science
> Charles University, Prague, Czech Republic
> https://www.natur.cuni.cz/biology/botany/
> https://lab-allience.natur.cuni.cz/
> 
> Institute of Botany, Czech Academy of Sciences
> Průhonice, Czech Republic
> https://www.ibot.cas.cz/en/
> Computing cluster
> https://sorbus.ibot.cas.cz/en/start
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Trimming protein alignment

2021-11-02 Thread Emmanuel Paradis

Hi Vojtěch,

In addition to removing lines 3 and 4, replace line 7:

foo <- function(x) sum(x == 4)

by:

foo <- function(x) sum(x == 0x58)

That sh(c)ould do it.

Best,

Emmanuel

- Le 2 Nov 21, à 17:37, Vojtěch Zeisek vo...@trapa.cz a écrit :

> Hello,
> I try to trim protein alignments in R. I see del.colgapsonly() and
> del.rowgapsonly() from ape can trim only class DNAbin. As DNAbin as basically
> same matrix as AAbin (Isn't it?), I thought it should be easy. I commented out
> lines 3 and 4 in both del.colgapsonly() and del.rowgapsonly(), but it didn't
> lead to success. I haven't found any other way how to trim protein alignment
> in R similar to what these two function do. Is there any way?
> Sincerely,
> V.
> 
> --
> Vojtěch Zeisek
> https://trapa.cz/en/
> 
> Department of Botany, Faculty of Science
> Charles University, Prague, Czech Republic
> https://www.natur.cuni.cz/biology/botany/
> https://lab-allience.natur.cuni.cz/
> 
> Institute of Botany, Czech Academy of Sciences
> Průhonice, Czech Republic
> https://www.ibot.cas.cz/en/
> Computing cluster
> https://sorbus.ibot.cas.cz/en/start
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Ancestral state reconstruction, polytomies and the absence of branch lengths

2021-09-17 Thread Emmanuel Paradis

Hi Jake & Diego,

Maybe a very small modification to ace() could help to accomodate some of these 
issues. Currently, ace(type = "discrete") checks the branch lengths of the tree 
and throws an error if any of them is zero or negative. I think zero-length 
branches could be allowed. This seems quite reasonable since matrix exponential 
works as expected in this situation, e.g., with expm:

R> expm::expm(matrix(rep(0, 4), 2, 2))
 [,1] [,2]
[1,]10
[2,]01

I'm not sure why the current check rejects trees with zero-length branches 
(probably there was a good reason some time ago). If this is changed, then 
calling ace() with a tree modified with multi2di() will be easier since the 
last function adds zero-length branch(es) in the tree.

Best,

Emmanuel

- Le 15 Sep 21, à 21:33, Jacob Berv jakeberv.r.sig.ph...@gmail.com a écrit :

> Hi Diego,
> 
> If your tree has polytomies and the branches have no length information, I am
> not sure if likelihood model-based reconstruction is right.
> 
> The ace function (and all other similar functions based on likelihood) will
> assume that your branch lengths represent some kind biological information
> (usually time), and the degree to which characters evolve will be related to
> the branch length. I therefore might prefer parsimony in this situation to get
> a general sense of what is going on.
> 
> As an experiment, you could assign all branches to be equal (e.g. = 1), but 
> then
> you are imposing this assumption on the analysis. In order to evaluate the
> effects of this assumption, and if you are specifically interested in
> reconstructing the ancestral states for a particular node, you could generate 
> a
> few sets of simulated branch lengths (which you could then pass to ace), to 
> see
> if the reconstruction is robust to a range of variation in branch lengths. You
> could also replicate using multi2di to see how robust your reconstruction is 
> to
> alternative resolutions.
> 
> Best,
> Jake
> 
> 
>> On Sep 15, 2021, at 2:16 AM, Diego Almeida-Silva 
>> 
>> wrote:
>> 
>> Dear colleagues,
>> 
>> I am trying to estimate ancestral states for two binary characters in a
>> full morphology-based phylogenetic tree. I am using ace function,
>> type="discrete" and kappa=0 as arguments. Unfortunately, I am dealing with
>> two problems about this issue: my topology have three polytomies and tree
>> branches have no lengths.
>> 
>> To perform ancestral state estimation, I firstly used the multi2di
>> function. However, this procedure creates artificial nodes in the topology.
>> This becomes a new problem, since states at these new nodes are also being
>> estimated.
>> 
>> Is there another way to deal with this situation? There are functions (or
>> packages) other than those of phytools that are able to deal with
>> polytomies and the absence of branch lengths in ancestral state
>> reconstruction?
>> 
>> Thanks in advance!
>> Diego
>> 
>>  [[alternative HTML version deleted]]
>> 
>> ___
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] dist.nodes

2021-09-06 Thread Emmanuel Paradis

Hi,

A tree has n terminal nodes (aka tips) and m internal nodes (aka nodes simply). 
In the edge matrix, the tips are numbered 1:n and the nodes are numbered 
(n+1):(n+m) (same than n+1:m). n and m can be found with:

n <- Ntip(tree) # or length(tree$tip.label)
m <- Nnode(tree) # or tree$Nnode

Details can be found in:

http://ape-package.ird.fr/misc/FormatTreeR.pdf

Best,

Emmanuel

- Le 5 Sep 21, à 21:07, Nick Youngblut nyoun...@gmail.com a écrit :
> For `ape::dist.nodes()`, how can one match the output matrix rows/columns with
> the node IDs in the tree (eg., the tip.labels)? I cannot just use
> `ape::cophenetic()` in my particular situation. The docs for
> `ape::dist.nodes()` state:
> 
> ```
> … in the case of dist.nodes, the numbers of the tips and the nodes (as given 
> by
> the element edge).
> ```
> 
> … but tree$edge doesn’t provide any direct info about which nodes are internal
> vs external and how the external tip labels map to the tree$edge matrix.
> 
> Thanks,
> Nick
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Extracting sequences from DNAbin according to a condition

2021-06-27 Thread Emmanuel Paradis

Hi Jarrett,

- Le 28 Juin 21, à 5:08, Jarrett Phillips phillipsjarre...@gmail.com a 
écrit :
> Hello All,
> 
> I have a COI FASTA alignment file with 11926 sequences spanning 668 species.
> 
> According to my code, a total of 361 species are represented by 6 or more
> sequences.
> 
> I need to extract these 361 species (along with all their associated
> sequences) from the alignment but am having issues.
> 
> Here is my code:
> 
>library(ape)
>library(pegas)
>library(spider)
>library(stringr)
> 
>seqs <- read.dna(file = file.choose(), format = "fasta") # import data
> and convert to matrix
>seqs.mat <- as.matrix(seqs)
> 
>spp <- substr(dimnames(seqs.mat)[[1]], 1, 50) # extract sequence labels
> 
>res <- str_remove(spp, "^[^|]+\\|") # remove BOLD Process ID

I suggest you store the output of table() in a distinct object so you keep 
'res':

tab.res <- tab(res)

Besides, table() can be a bit long so it's better to avoid duplicating its 
call. Then you can do:

sel <- tab.res >= 6
species.to.keep <- names(tab.res)[sel] # which() is not necessary here
length(species.to.keep) # should be 361
i <- which(res %in% species.to.keep)
write.dna(seqs[i, ], .

This way, you can change your selection criterion easily. Suppose you want to 
select the species with at least 5 sequences but no more than 10, then you'd 
only need to change the first line:

sel <- tab.res >= 5 & tab.res <= 10

HTH.

Cheers,

Emmanuel

>res <- table(res)[which(table(res) >= 6)] # species with 6 or more
> records
>names(res) # 361 species names
> 
> 
> The problem is that `names(res)' contains unique species names.rather than
> repeated species names (according to the BOLD process ID).
> 
> I also need the sequences. There are between 6-421 sequences across the 361
> species (11143 sequences total).
> 
> Does anyone have any ideas on next steps here?
> 
> Once this is done, I would then write the sequences to a file via
> `write.dna()`
> 
> If clarification is needed, please let me know.
> 
> 
> Thanks.
> 
> Cheers,
> 
> Jarrett
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Phylogenetic network / ARG representation in ape

2021-05-21 Thread Emmanuel Paradis

Hi Yan,

- Le 18 Mai 21, à 23:48, Yan Wong y...@pixie.org.uk a écrit :
> Dear R-Sig-Phylo,
> 
> I’m interested in representations of ancestral recombination graphs (ARGs). 
> From
> my reading of the extended Newick format (G. Cardona et al., 2008), and the
> ape/evonet documentation, it appears as if the R-base representations of
> phylogenetic networks don't have a standard way to specify which pieces of a
> genome pass though which edges in the network.

Correct. The "phylo" and "evonet" classes ignore whether the (optional) vector 
'edge.length' represents time, number of mutations/substitions, accumulated 
variance, ... The only constraint is that it must be numeric.

> As a corollary, there doesn’t
> appear to be any built-in function to enforce the constraint that those edges
> that are valid at any one genomic location form a marginal (or “local”) tree.

That's the logic of the S3 class system: it's up to you to add elements, 
functions, checks, etc, that are relevant to your project/questions. See for 
instance, the tree output by ape::chronos which has the class c("chronos", 
"phylo"). You can find another strategy in phangorn which can fit partioned 
models of molecular evolution: in that case each partition of the genome has 
its own "phylo" tree (the topology can differ among partitions).

> Is my reading of this correct, and are there any R packages which go any way 
> to
> representing marginal trees along a genome in this way?

Not that I aware of.

> I ask partially because we are starting to document simple details of how to 
> run
> our tree-sequence software in R, via the reticulate package
> (https://tskit.dev/tutorials/tskitr.html), and it would be useful to know if 
> we
> could convert our tree sequences (a format which can be thought of as an ARG)
> into a format with a suitable representation in R. We might then be able to
> take advantage of any nice R-based visualisation or analysis packages.

phangorn has other tools to represent/plot networks with its class "networx".

Best,

Emmanuel

> Cheers
> 
> Yan
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Error in Fitting OU Model in R

2021-05-18 Thread Emmanuel Paradis

Hi Russell , 

It can happen that PGLS models have difficulties to fit to the data when there 
is a free parameter in the correlation structure. This is the case of 
corMartins and corPagel. A way around is to set 'fixed = TRUE' when creating 
these correlation structures which has the effect of taking the value given to 
the argument of the same name as fixed. You may then try different values and 
avoid the error you observed. 

HTH 
Best, 

Emmanuel 

- Le 14 Mai 21, à 2:56, Russell Engelman  a écrit 
: 

> Dear R-Sig-Phylo
> I am trying to re-submit a paper for publication and one of the things the
> reviewers asked of me was to run the data under an OU model in addition to a
> Brownian one. The problem is when I try to run an OU model I get the following
> error:

> "Error in corFactor.corStruct(object) : NA/NaN/Inf in foreign function call 
> (arg
> 1)"

> I looked up the error message but I couldn't really find what could be driving
> it. My data doesn't have any missing values or tips without data, and I 
> checked
> the branch lengths and none of the branches have a length of zero. Even when
> running a force.ultrametric() parameter on the phylogeny the function still
> fails to run.

> The strange thing is the same code fits a Brownian model just fine but throws
> back an error message when I try fitting an OU model. Attached is some code
> that shows what I mean. I was wondering if anyone might know what was going on
> with this?

> Sincerely,
> Russell

> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] How to sort trait data according to tree

2021-05-17 Thread Emmanuel Paradis

- Le 17 Mai 21, à 13:41, Marguerite Butler  a écrit : 

> Thank you very much for the reply Emmanuel!

> OK, yes I just tried and Iʻm surprised that this works:

>> dat <- data.frame(Species = c("Homo", "Pongo", "Macaca", "Ateles", 
>> "Galago"), X
> > = X, Y = Y)
> > cbm <- corBrownian(1, tree, form = ~Species)
> > gls(Y ~ X, dat, correlation = cbm)

> How does corBrownian know about Species, when it is only defined as an element
> of dat, and not known in the global environment?

corBrownian() doesn't know where all these variables are located. Indeed, if 
you print 'cbm': 

R> cbm 
Uninitialized correlation structure of class corBrownian 

It's gls() that looks for them. When you think about it, using "data = ..." is 
really a good way to be sure that all variables are ordered correctly whether 
they are involved as predictors (in 'model = '), in the correlation structure, 
or in the variance function ('weights = '). 

Best, 

Emmanuem 

> Anyway, thank you!
> Marguerite

> On Sun, May 16, 2021 at 7:03 PM Emmanuel Paradis < [
> mailto:emmanuel.para...@ird.fr | emmanuel.para...@ird.fr ] > wrote:

>> Hi Marguerite,

>> The issue is about nlme::gls. The help page ?gls says:

>> data: an optional data frame containing the variables named in
>> ‘model’, ‘correlation’, ‘weights’, and ‘subset’. By default
>> the variables are taken from the environment from which ‘gls’
>> is called.

>> So you make it work by removing "dat$" in the call to corBrownian():

>> cbm <- corBrownian(1, tree, form = ~Species)

>> Best,

>> Emmanuel

>> - Le 17 Mai 21, à 6:22, Marguerite Butler [ mailto:mbutler...@gmail.com |
>> mbutler...@gmail.com ] a écrit :

>> > Aloha all,

>> > Does anyone know the answer to this question - how does ape pass the
>> > species name information into gls? Seems to be a global environment issue
>> > when using apeʻs corBrownian and nlmeʻs gls? and a very specific form is
>> > required, but why?

>> > Modified from ape help page for CorClasses:

>> > ---

>> > library(nlme)
>> > library(ape)

>> > tree <- read.tree(text =
>> > "Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);")
>> > X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968)
>> > Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259)
>> > dat <- data.frame(Species = c("Homo", "Pongo", "Macaca", "Ateles",
>> > "Galago"), X = X, Y = Y)

>> > gls(Y ~ X, dat, correlation=corBrownian(1, tree.primates, form = ~Species))
>> > ---

>> > The above works just fine. However, if we try to calculate corBrownian
>> > separately, gls cannot understand the corBrownian object:
>> > ---
>> >> cbm <- corBrownian(1, tree, form = ~dat$Species) # calculate correlation
>> >> separately
>> >> gls(Y ~ X, dat, correlation = cbm) ## does not work, invalid type
>> >> (list) for dat
>> > Error in model.frame.default(formula = ~dat + Species + Y + X, data = 
>> > list( :
>> > invalid type (list) for variable 'dat'
>> > ---

>> > If we put the species information into a vector in the global
>> > environment it works. Why canʻt we use dat$Species?
>> > ---
>> > spp <- dat$Species # save species as vector in global environment
>> > cbm <- corBrownian(1, tree, form = ~spp) # calculate correlation separately
>> > gls(Y ~ X, dat, correlation = cbm) ## works!

>> > Thanks,

>> > Marguerite



>>> On Fri, May 14, 2021 at 10:21 PM Marguerite Butler < [
>> > mailto:mbutler...@gmail.com | mbutler...@gmail.com ] >
>> > wrote:

>> >> Aloha Oliver,

>> >> From the cor.Brownian help page, the explanation for the form argument is
>> >> this:

>> >> a one sided formula of the form ~ t, or ~ t | g, specifying the taxa
>> >> covariate t and, optionally, a grouping factor g. A covariate for this
>> >> correlation structure must be character valued, with entries matching the
>> >> tip labels in the phylogenetic tree. When a grouping factor is present in
>> >> form, the correlation structure is assumed to apply only to observations
>> >> within the same grouping level; observations with different grouping 
>> >> levels
>> >> are assumed to be uncorrelated. Defaults to ~ 1, which corresponds to 
>> >> using
>> >> the order of the observations in the d

Re: [R-sig-phylo] How to sort trait data according to tree

2021-05-16 Thread Emmanuel Paradis

Hi Marguerite,

The issue is about nlme::gls. The help page ?gls says:

data: an optional data frame containing the variables named in
  ‘model’, ‘correlation’, ‘weights’, and ‘subset’. By default
  the variables are taken from the environment from which ‘gls’
  is called.

So you make it work by removing "dat$" in the call to corBrownian():

cbm <- corBrownian(1, tree, form = ~Species)

Best,

Emmanuel

- Le 17 Mai 21, à 6:22, Marguerite Butler mbutler...@gmail.com a écrit :

> Aloha all,
> 
> Does anyone know the answer to this question -  how does ape pass the
> species name information into gls? Seems to be a global environment issue
> when using apeʻs corBrownian and  nlmeʻs gls? and a very specific form is
> required, but why?
> 
> Modified from ape help page for CorClasses:
> 
> ---
> 
> library(nlme)
> library(ape)
> 
> tree <- read.tree(text =
> "Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);")
> X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968)
> Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259)
> dat <- data.frame(Species = c("Homo", "Pongo", "Macaca", "Ateles",
> "Galago"), X = X, Y = Y)
> 
> gls(Y ~ X, dat, correlation=corBrownian(1, tree.primates, form = ~Species))
> ---
> 
> The above works just fine. However, if we try to calculate corBrownian
> separately, gls cannot understand the corBrownian object:
> ---
>> cbm <- corBrownian(1, tree, form = ~dat$Species)# calculate correlation
>> separately
>> gls(Y ~ X, dat, correlation = cbm)   ## does not work, invalid type
>> (list) for dat
> Error in model.frame.default(formula = ~dat + Species + Y + X, data = list( :
>  invalid type (list) for variable 'dat'
> ---
> 
> If we put the species information into a vector in the global
> environment it works. Why canʻt we use dat$Species?
> ---
> spp <- dat$Species   # save species as vector in global environment
> cbm <- corBrownian(1, tree, form = ~spp)# calculate correlation separately
> gls(Y ~ X, dat, correlation = cbm)   ## works!
> 
> Thanks,
> 
> Marguerite
> 
> 
> 
> On Fri, May 14, 2021 at 10:21 PM Marguerite Butler 
> wrote:
> 
>> Aloha Oliver,
>>
>> From the cor.Brownian help page, the explanation for the form argument is
>> this:
>>
>> a one sided formula of the form ~ t, or ~ t | g, specifying the taxa
>> covariate t and, optionally, a grouping factor g. A covariate for this
>> correlation structure must be character valued, with entries matching the
>> tip labels in the phylogenetic tree. When a grouping factor is present in
>> form, the correlation structure is assumed to apply only to observations
>> within the same grouping level; observations with different grouping levels
>> are assumed to be uncorrelated. Defaults to ~ 1, which corresponds to using
>> the order of the observations in the data as a covariate, and no groups.
>>
>> This means that "t" should contain the species names or tip labels,
>> spelled exactly as they are in your phylogeny.  So for example if your data
>> SteninaeData
>> has a column for species names called "species" then you would specify:
>>
>> cor.BM <- corBrownian(phy=SteninaeTree, form = ~dat$species)
>>
>> and then proceed with your regression model as above.
>>
>>
>> You can also follow the example on the corClasses page:
>> http://127.0.0.1:27386/library/ape/html/corClasses.html
>>
>> Slightly modified here:
>>
>> ---
>>
>> library(nlme)
>>
>> library(ape)
>>
>> tree <-
>> read.tree(text="Homo:0.21,Pongo:0.21):0.28,Macaca:0.49):0.13,Ateles:0.62):0.38,Galago:1.00);"
>>
>> X <- c(4.09434, 3.61092, 2.37024, 2.02815, -1.46968)  # a vector
>>
>> Y <- c(4.74493, 3.33220, 3.36730, 2.89037, 2.30259)   # a vector
>>
>> Species <- c("Homo", "Pongo", "Macaca", "Ateles", "Galago")  # a vector
>>
>> dat <- data.frame(Species = Species, X = X, Y = Y)# make a dataframe
>> with Species, X, Y, can also read in .csv
>> cbm <- corBrownian(1, tree, form = ~dat$Species)
>>
>> m1 <- gls(Y ~ X, dat, correlation=cbm)
>>
>> m1
>>
>> ---
>>
>> hth,
>>
>> Marguerite
>>
>> On Fri, May 14, 2021 at 9:46 PM Oliver Betz 
>> wrote:
>>
>>> Dear all:
>>>
>>> I am currently facing the following problem:
>>>
>>> Unsing ape, nlme por phylolm for PGLS I was wondering how to sort my
>>> trait data (the ones in my csv file) according to the tree, so that
>>> both are in the same order. I learned how to check whether the data &
>>> the tree coincide in both number of species and names, but I do not
>>> nderstand how to perform the mentioned sorting task.
>>>
>>> Once I am running the PGLS commands such as
>>> cor.BM <- corBrownian(phy=SteninaeTree)
>>> pgls <- gls(LOG_rel_Attachment_force_smooth ~
>>> LOG_rel_Anzahl_Hafthaare_4.Vordertarsus, correlation = cor.BM, data =
>>> SteninaeData, method = "ML")"
>>>
>>> I regularly receive the follwing warning note:
>>> "In Initialize.corPhyl(X[[i]], ...) :
>>> No covariate specified, species will be taken as ordered in the data
>>> frame. To

Re: [R-sig-phylo] Extract singleton haplotypes with pegas::haplotype()

2021-04-26 Thread Emmanuel Paradis

Hi Jarrett,

- Le 27 Avr 21, à 0:09, Jarrett Phillips phillipsjarre...@gmail.com a écrit 
:
> Hi All,
> 
> I'm looking for a way to extract singleton haplotypes using the haplotype
> function in Pegas.
> 
> I use the below function to accomplish this:
> 
>extract.singleton.haps <- function(freq) {
>  ind <- which(freq == 1) # index of singleton haplotypes
>  single <- seqs[ind, ] # extracted haplotypes (only singletons)
> }

The object 'seqs' is neither created inside the function nor passed as 
argument, so R will look for it using the infamous lexical scoping rule, 
meaning that you're not sure it'll do what you want it to do.

> I have tried the following:
> 
>library(pegas)
>seqs <- read.dna(file = file.choose(), format = "fasta") # select FASTA
> file
>freq <- haplotype(seqs) # haplotype distribution
>freq <- lengths(attr(freq, "index") # extract haplotypes

In the line above, you overwrite 'freq' created in the previous line, so maybe 
you can change these two for:

haps <- haplotype(seqs) # haplotype sequences and distribution
freq <- lengths(attr(haps, "index") # extract frequencies

>singleton.seqs <- extract.singleton.haps(freq)

See my comment above: 'seqs' exists in your global environment so the function 
singleton.seqs will use the original sequences (because of lexical scoping), 
not the haplotype ones. You can do:

singleton.seqs <- haps[freq == 1, ]

or:

singleton.seqs <- seqs[attr(haps, "index")[freq == 1, ], ]

I'm not sure that writing a function is really needed here: probably you want 
to examine all values in 'freq'.

HTH

Emmanuel

>write.dna(singleton.seqs, file = paste0(taxon, "_singletons.fas"),
> format = "fasta")
> 
> 
> However, upon viewing the sequences in MEGA, they appear to all be
> identical, instead of unique.
> 
> Any ideas on what is happening here?
> 
> Thanks.
> 
> Cheers,
> 
> Jarrett
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] ape: error in pcoa

2021-04-24 Thread Emmanuel Paradis

Hi Juan,

There is a bug in pcoa() when only one axis is selected by the function. You 
can fix it by editing the function with fix(pcoa) and modify line 38 (or close 
depending on how the editor arranges the lines of code):

vectors <- sweep(D.eig$vectors[, 1:k], 2, sqrt(eig[1:k]), FUN = "*")

insert ', drop = FALSE' where D.eig$vectoes is indexed so this line becomes:

vectors <- sweep(D.eig$vectors[, 1:k, drop = FALSE], 2, sqrt(eig[1:k]), FUN = 
"*")

save and close. There are two other similar lines but they are not used in your 
example. I've fixed that in ape 5.5 (which should be on CRAN soon).

As a side note, it took me some time to figure out the shape of the data you 
posted. Mail applications sometimes format lines of text in different ways, so 
it's better to run a command like this:

cat(deparse(DH), sep = "\n")

and paste the output in your message. This can then be input with: DH <- [[the 
output from above]] with the attributes correctly set.

Best,

Emmanuel

- Le 23 Avr 21, à 21:24, Juan Antonio Balbuena j.a.balbu...@uv.es a écrit :

> Hellow,
> 
> I am running a set of simulations with distance matrices used as input
> to pcoa (package ape).
> 
> In one instance the matrix (DH) is
> 
> H18 H20 H21 H18 0.000 0.3127452190 0.3127452190 H20 0.3127452
> 0.00 0.0001625185 H21 0.3127452 0.0001625185 0.00
> 
> when I run pcoa I get this
> 
> H.PCo <- pcoa(DH, correction="none")$vectors Error in array(STATS,
> dims[perm]) : 'dims' cannot be of length 0
> 
> However, if I multiply DH*10 pcoa runs just fine:
> 
> H.PCo <- pcoa(DH*10, correction="none")$vectors
> 
> A word of explanation will be very much appreciated.
> 
> Thank you for your attention
>  
> Juan A. Balbuena
> 
> --
> 
> Dr. Juan A. Balbuena
> Cavanilles Institute of Biodiversity and Evolutionary Biology
> Symbiont Ecology and Evolution Lab
> University of Valencia http://www.uv.es/~balbuena
> 
> P.O. Box 22085 http://www.uv.es/cophylpaco 
> 46071 Valencia, Spain
> e-mail: j.a.balbu...@uv.es tel. +34 963 543
> 658    fax +34 963 543 733
> 
> *NOTE!*For shipments by EXPRESS COURIER use the following street address:
> C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain.
> 
> 
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] identifying phylogenetically equivalent nodes

2021-02-18 Thread Emmanuel Paradis

Hi Jacob,

ape::makeNodeLabel(phy, method = "md5sum") returns 'phy' with node labels that 
depend on the tips descendant from each node. For instance:

tr3 <- makeNodeLabel(rtree(3), m = "m")
tr4 <- makeNodeLabel(rtree(4), m = "m")
any(tr3$node.label %in% tr4$node.label)

If you repeat these 3 commands several times, you should have ~20% of TRUE. In 
your case, match() should make more sense.

Also, I suppose your trees are rooted. If they are unrooted, you should 
consider using splits (or root them).

Best,

Emmanuel

- Le 18 Fév 21, à 0:59, Jacob Berv jakeberv.r.sig.ph...@gmail.com a écrit :
> Dear R-sig-phylo,
> 
> Over the weekend, I asked Liam Revell if he had a solution to use matchNodes 
> for
> a particular problem I’m trying to solve—finding all phylogenetically
> equivalent nodes when comparing trees that have uneven taxon samples and
> different topologies. Liam was kind enough to take some time to write a blog
> post about this, and got me started with some code
> 
> http://blog.phytools.org/2021/02/on-matching-nodes-between-trees-using.html
> 
> On it’s face this seems like a simple problem, but I’m running into some 
> issues
> and thought I would reach out to the broader group. The code linked above 
> seems
> to work, but only for comparing trees that start out as topologically
> identical. For my purposes, I’m trying to match nodes from a given a 
> reference,
> to nodes in and across several hundred gene trees that differ in topology and
> taxon sample relative to the reference.
> 
> Here is a function definition based on Liam’s example
> 
> #function to match nodes from consensus
> #to individual gene trees with uneven sampling
> #derived from Liam Revell's example-- need to
> testmatch_phylo_nodes<-function(t1, t2){
>  ## step one drop tips
>  t1p<-drop.tip(t1,setdiff(t1$tip.label, t2$tip.label))
>  t2p<-drop.tip(t2,setdiff(t2 $tip.label, t1$tip.label))
>  
>  ## step two match nodes "descendants"
>  M<-matchNodes(t1p,t2p)
>  
>  ## step two match nodes "distances"
>  M1<-matchNodes(t1,t1p,"distances")
>  M2<-matchNodes(t2,t2p,"distances")
>  
>  ## final step, reconcile
>  MM<-matrix(NA,t1$Nnode,2,dimnames=list(NULL,c("left","right")))
>  
>  for(i in 1:nrow(MM)){
>MM[i,1]<-M1[i,1]
>nn<-M[which(M[,1]==M1[i,2]),2]
>if(length(nn)>0){
>MM[i,2]<-M2[which(M2[,2]==nn),1]
>}
>  }
>  return(MM)
> }
> 
> 
> When t1 and t2 are trees that have topological conflicts, this function 
> returns
> an error:
> 
> Error in MM[i, 2] <- M2[which(M2[, 2] == nn), 1] :
>  replacement has length zero
> 
> I think(?) this happens because a particular node doesn’t exist in one or the
> other trees, and it returns integer(0) at that line — but I’m not sure I 
> really
> understand what is going on here.
> 
> 
> I modified Liam’s code slightly to get it to run without error in the 
> described
> case, by making it conditional on that particular line:
> 
> 
> Modified version
> 
> #function to match nodes from consensus
> #to individual gene trees with uneven sampling
> #derived from Liam Revell's example-- need to test
> match_phylo_nodes<-function(t1, t2){
>   ## step one drop tips
>   t1p<-drop.tip(t1,setdiff(t1$tip.label, t2$tip.label))
>   t2p<-drop.tip(t2,setdiff(t2 $tip.label, t1$tip.label))
> 
>   ## step two match nodes "descendants"
>   M<-matchNodes(t1p,t2p)
> 
>   ## step two match nodes "distances"
>   M1<-matchNodes(t1,t1p,"distances")
>   M2<-matchNodes(t2,t2p,"distances")
> 
>   ## final step, reconcile
>   MM<-matrix(NA,t1$Nnode,2,dimnames=list(NULL,c("left","right")))
> 
>   for(i in 1:nrow(MM)){
>   MM[i,1]<-M1[i,1]
>   nn<-M[which(M[,1]==M1[i,2]),2]
>if(length(nn)>0){
>   if(length(which(M2[,2]==nn))>0){
>   MM[i,2]<-M2[which(M2[,2]==nn),1]
>   }
>} else {
>}
> }
> return(MM)
> }
> 
> 
> I’ve been experimenting with this and some downstream code for the last few
> days, but I’ve run into some weird inconsistent results (not easily 
> summarized)
> that make me think that this function is not working as intended.
> 
> I was wondering — have any of you dealt with a similar problem? In principle
> this seems like it should be similar to concordance analysis, but I care less
> about identifying the proportion of nodes that exist in gene trees given a
> reference, and instead I need the actual node numbers in a given gene tree 
> that
> are phylogenetically equivalent to particular nodes in a reference. Happy to
> try to hack away at something…
> 
> 
> Best,
> Jake Berv
> 
> 
> 
> 
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org

Re: [R-sig-phylo] Nonsensical likelihoods returned when using the 'ace' function and the 'all rates different' model in the R package: 'Ape'

2021-01-29 Thread Emmanuel Paradis

Hi Joe,

I'd say this issue is related to these particular data: there are four states 
so the ARD model has 12 parameters which is quite a lot considering that one 
state ('3') is overwhelming represented in the data. If you simulate data with 
two states, then the ARD model has only 2 parameters and the fit converges most 
of the times.

Best,

Emmanuel

- Le 29 Jan 21, à 5:49, Joseph Keating joe.keat...@bristol.ac.uk a écrit :

> Dear R-sig-phylo mailing group
> 
> I'm having some problems using the 'all rates different' (ARD) model of the
> 'ace' function. The function seems to return different errors / warnings
> depending on the specified method for computing the exponential of the rate
> matrix. Most notably, its returning likelihoods for states that are >1 or <0. 
> I
> don't get the same errors if using the 'SYM' or "ER" models.
> 
> Please see my post on stackoverflow for an example. Link below:
> https://stackoverflow.com/questions/65944177/nonsensical-likelihoods-returned-when-using-the-ace-function-and-the-all-rate
> 
> If anyone could provide any help or advice, it would be hugely appreciated. 
> And
> thanks to Ben Bolker for pointing me to this mailing group.
> 
> Kind Regards
> 
> Joe Keating
> [https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-i...@2.png?v=73d79a89bded]
> phylogeny - Nonsensical likelihoods returned when using the 'ace' function and
> the 'all rates different' model in the R package: 'Ape' - Stack
> Overflow
> I'm having some problems using the 'all rates different' (ARD) model of the
> 'ace' function. The function seems to return different errors / warnings
> depending on the specified method for computing the exponential of the rate
> matrix.
> stackoverflow.com
> 
> 
> 
> Dr Joseph Keating
> Palaeobiology Research Group
> University of Bristol
> Bristol Life Sciences Building
> 24 Tyndall Avenue
> Bristol
> BS8 1TQ
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Phylogenetic Signal in Internal Node Data

2021-01-26 Thread Emmanuel Paradis

Hi Matt,

You can think about a randomization procedure: rTraitDisc(), rTraitCont(), and 
rTraitMult() all have the option 'ancestor' to output the values of a trait 
simulated along a phylogeny for all terminal and internal nodes.

HTH

Best,

Emmanuel

- Le 27 Jan 21, à 0:42, Matthew Helmus mrhel...@gmail.com a écrit :

> Hello,
> 
> Is there a method that can test for phylogenetic signal in data associated
> with the internal nodes of a tree rather than data associated with tree
> tips?
> 
> Thank you!
> Matt
> 
> --
> Matthew R. Helmus, Ph.D.
> Integrative Ecology Lab 
> Center for Biodiversity 
> Department of Biology
> Temple University
> Philadelphia, PA 19122
> 
> facebook  twitter
>  instagram
> 
> Phone: 215 204-5989 <2152045989>
> Email: mrhel...@temple.edu
> Office: 538 SERC, Main Campus
> 
> 
> --
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Plotting phylogenetic tree

2021-01-19 Thread Emmanuel Paradis

Hi Saleh,

Two functions in ape can help you:

?zoom
?trex

There may be others in other packages.

Best,

Emmanuel

- Le 19 Jan 21, à 19:42, Saleh Rahimlou saleh_rahim...@hotmail.com a écrit :

> Hello,
> 
> How can I collapse some clades plotting a big phylogenetic tree to magnify my
> favorite taxa?
> 
> Best,
> Saleh Rahimlou
> Ph.D. candidate
> Department of Botany and Ecology
> University of Tartu
> 14A Ravila, 50411 Tartu
> Estonia
> saleh.rahim...@ut.ee
> saleh.rahim...@lrsv.ups-tlse.fr
> 
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Error in corFactor when running gls to obtain lambda

2021-01-03 Thread Emmanuel Paradis

Hi Chris,

It is not clear from your message where the error occurred. Can you please send 
a reproducible example? I don't think the last model fit (output named 'OLS') 
gives an error since gls() with no correlation structure is equivalent to using 
lm().

Also if you use fixed=1 this is the same thing than fixed=TRUE, not value=1 
(supposing this is what you want to do) altough this is likely to not make a 
difference here.

Best,

Emmanuel

- Le 16 Déc 20, à 0:42, chris pestana pestanach...@gmail.com a écrit :

> Hi when I execute the code below I get the following message:
> 
> Error in corFactor.corStruct(object) :
>  NA/NaN/Inf in foreign function call (arg 1)In addition: Warning
> message:In Initialize.corPhyl(X[[i]], ...) :  No covariate specified,
> species will be taken as ordered in the data frame. To avoid this
> message, specify a covariate containing the species names with the
> 'form' argument.
> 
> 
> If I used fixed=1, it resolves the problem, but wouldn't this impact my
> model selection structure as lambda is not permitted to vary?
> 
> 
> hominin_char<-read.csv("dropped-tips.csv", header=T, row.names=1)
> hominin_char
> tree1<-drop.tip(tree1, tip = c("African_H_erectus_DAN5_P1", "K_platyops"))
> #
> 
> 
> 
> plot(tree1)
> ### This is a method for selecting a scalable model for
> encephalization
> BM<-gls(log(Brain_mass)~log(Body_mass),
> data=hominin_char,correlation=corBrownian(1,phy=tree1),
> method="ML")
> 
> OU<-gls(log(Brain_mass)~log(Body_mass), data=hominin_char,
>correlation=corMartins(1, phy=tree1), method="ML")
> 
> Lambda<-gls(log(Brain_mass)~log(Body_mass),
> data=hominin_char, correlation=corPagel(1, phy=tree1),
> method="ML")
> 
> EB<-gls(log(Brain_mass)~log(Body_mass), data=hominin_char,
> correlation=corBlomberg(1, phy=tree1, fixed=TRUE),
>method="ML")
> 
> OLS<-gls(log(Brain_mass)~log(Body_mass), data=hominin_char, method="ML")
> 
> summary(Lambda)
> Cand.models = list()
> Cand.models[[1]] = BM
> Cand.models[[2]] = OU
> Cand.models[[3]] = Lambda
> Cand.models[[4]] = EB
> Cand.models[[5]] = OLS
> 
> Modnames = paste(c("BM","OU", "Lambda", "EB", "OLS"), sep = " ")
> aictab(cand.set = Cand.models, modnames = Modnames, sort = T)
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] error with bind.tree + ladderize

2020-10-25 Thread Emmanuel Paradis

Hi Joseph,

Thanks for reporting this. It seems indeed that reordering the output of 
bind.tree() fixes it. A way to do it is:

attr(t2, "order") <- NULL
t2 <- reorder(t2)

It seems to solve the different problems (balance(t2) doesn't work correctly 
too). I'll add these 2 lines as the end bind.tree().

Best,

Emmanuel

- Le 22 Oct 20, à 22:42, Joseph Brown josep...@umich.edu a écrit :

> I came across the following error: when trying to ladderize a tree that had
> gone through bind.tree, the tree becomes corrupted. Specifically, I added a
> single tip to an internal node (code to replicate below).
> 
> Note:
> 1) this error did not seem to occur when attaching the tip sister to an
> original tip (i.e., an external edge).
> 2) when the binded tree involved multiple tips attached to the same
> internal position, the same edge/edge.length issues arise (see below), but
> R crashes completely when trying to plot
> 3) if the tree is written out and read back in prior to ladderizing, no
> problems occur
> 
> tl;dr: when ladderizing the binded tree, the edge matrix and edge.length
> vectors are missing rows/elements corresponding to edges of tree "y" (the
> second tree) in bind.tree. forcing a reordering seems to get around things,
> but cladewise ordering alone fails.
> 
> Perhaps bind.tree needs to incorporate a reordering step?
> 
> HTH.
> Joseph.
> 
> #
> library(ape);
> set.seed(31459);
> t1 <- rtree(10);
> 
> # let's add a terminal with bind.tree
> term <- rtree(1, tip.label="gnu_tip");
> t2 <- bind.tree(x=t1, y=term, where=12, position=0.5);
> 
> # plot
> plot(t2); axisPhylo();
> 
> # looks correct. but let's ladderize to make things pretty
> plot(ladderize(t2)); axisPhylo();
> # Warning message:
> # In yy[TIPS] <- 1:Ntip :
> # number of items to replace is not a multiple of replacement length
> 
> # huh? looks like the edge matrix is incorrect
> dim(t2$edge);
> # [1] 20  2
> dim(ladderize(t2)$edge);
> # [1] 19  2
> # the new terminal is missing from the edge matrix
> # same with the edge.length vector:
> length(t2$edge.length)
> # [1] 20
> length(ladderize(t2)$edge.length)
> # [1] 19
> # but tip.label vector is ok:
> length(t2$tip.label) == length(ladderize(t2)$tip.label)
> # [1] TRUE
> 
> # so: we do not have enough edges for the number of terminals
> 
> # can we fix it by reordering?
> plot(ladderize(reorder(t2, "cladewise")));
> # nope: same error. note that attributes(t2)$order was already "cladewise"
> # i'm guessing the reorder command is ignored because of its current status?
> 
> # try a different ordering to force processing
> plot(ladderize(reorder(t2, "pruningwise")));
> # success
> 
> # what if we want "cladewise" ordering for downstream analyses?
> # maybe we can pass it through reorder.phylo twice?
> plot(ladderize(reorder(reorder(t2, "pruningwise"), "cladewise")));
> # yup!
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Multi2di still not generating equiprobable trees

2020-10-20 Thread Emmanuel Paradis

Hi Yan,

You're correct: multi2di() calls rtree(, equiprob = TRUE) which generates the 
random topology. Actually, the unlabelled topologies are generated with equal 
probabilities. For the labelled topologies, this is a bit more complicated. In 
the case n = 4 tips, there are two unlabelled topologies:

(((,),),);
((,),(,));

with 12 possible arrangements of the tip labels for the first topology and 3 
for the second one. A solution is to use different probabilities when splitting 
the set of tips. Let's say the n tips are split into 2 subsets of sizes a and b 
(with n > 2, a + b = n, and both a and b > 0). Then the weight of the pair 
(a,b) is calculated with:

howmanytrees(a) * howmanytrees(b) * choose(n, a)

This seems to give correct frequencies for n = 4. It's more complicated to 
check with larger values of n because of the large numbers of topologies, but 
this looks OK.

Cheers,

Emmanuel

PS: thank you for spelling mistake on ape's homepage: this must have been there 
since 2012.

- Le 20 Oct 20, à 4:23, Yan Wong y...@pixie.org.uk a écrit :

> Thanks to Emmanuel for adding the equiprob=TRUE option to multi2di. However, 
> I’m
> not convinced that it’s working correctly - I still get more balanced trees
> than unbalanced trees in ape 5.4.1
> 
>> library(phytools)
>> library(phangorn)
>> library(ape)
>> packageVersion("ape")
> [1] '5.4.1'
> 
>> reps <- do.call(c, lapply(1:10, function(x)
> +  multi2di(starTree(c('a','b','c','d')), equiprob=TRUE)))
>> balanced = read.tree(text="((a,c),(b,d));”) # test one of 15 4-tomy 
>> resolutions
>> test <- sapply(reps, function(x) RF.dist(x, balanced, rooted=TRUE) == 0)
>> mean(test) # should be ~ 1/15 ~ 0.0667
> [1] 0.16625
> 
> By the way, there is a spelling mistake on the ape homepage too ("ape souce
> code” - is missing an “r” in “source”).
> 
> Cheers
> 
> Yan
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] ape package in R

2020-09-19 Thread Emmanuel Paradis

Hi,

Actually, Ranjita can use the number of gene copies if it is reasonable to 
assume that these numbers evolve among samples, maybe a stepwise model with 
transitions: 

n -> n + 1
n -> n - 1

In that case, a simple approach is to calculate the Euclidean distances among 
samples and do an NJ or ME tree. Another approach is to use phangorn and a 
parsimony or maximum likelihood (ML) method depending on the model you want to 
assume.

The functions needed are in different packages (as usual you are advised to 
check the help pages and the options):

utils::read.csv
stats::dist
ape::nj
ape::fastme.bal
phangorn::phyDat
phangorn::parsimony

And as suggested by Salvador, if you have the sequences of the gene copies for 
each sample (which would be a very nice study), you can get the evolutionary 
distances with:

ape::dist.dna

or fit an evolutionary model by ML with phangorn::pml.

HTH.

Best,

Emmanuel

- Le 20 Sep 20, à 3:14, Salvador Espada Hinojosa salvador.esp...@gmail.com 
a écrit :

> If you want to use ape for making a phylogenetic tree, I think you need the
> actual sequences of the genes, and not only the copy numbers
> 
> El sáb., 19 sept. 2020 a las 21:40, Ranjita Thapa ()
> escribió:
> 
>> Hi,
>>
>> I want to construct a phylogenetic tree with ape package. I have a csv
>> file with gene ids and copy number for each gene across different samples.
>> I want to construct phylogenetic tree. I am wondering how could I convert
>> the csv file to input file for ape. Could you please suggest me how can I
>> convert csv file to input file that is acceptable by ape?
>>
>> Thanks
>> Ranjita___
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at
>> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>>
> 
> 
> --
> Salvador Espada Hinojosa
> Monika Bright's Lab
> Department of Functional and Evolutionary Ecology
> University of Vienna
> Althanstraße, 14
> 1090 Vienna (Austria)
> Mobile: (+43) 660 5952724
> skype: salva_e
> http://salvae.net 
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Problem - removing the extinct taxa from trees

2020-08-26 Thread Emmanuel Paradis

Hi Elvira,

If you are interested in keeping the "surviving species" from phylogenies, it 
may be better to use a birth-death model to simulate the trees. There are 3 
functions in ape to do this: they are documented in the same help page than 
drop.fossil() and they have options that make them quite flexible. If you 
really want to use rtree(), then, as rightly pointed out by Liam, the (default) 
uniform distribution may not be the right choice to generate branch lengths.

Best,

Emmanuel

- Le 26 Aoû 20, à 8:33, Elvira D`bastiani elviradbasti...@gmail.com a écrit 
:

> Hi Liam,
> 
> That, are branch lengths. I know that these NaNs correspond to the species
> that have become extinct.
> When I talk about 'delete' the NaNs, I want my newick not to have those
> branches with NaNs.
> 
> You helped me!
> Thank you for talking about it with me!
> 
> All the best,
> 
> Em ter., 25 de ago. de 2020 às 22:14, Liam J. Revell 
> escreveu:
> 
>> Hi Elvira.
>>
>> In your Newick string those NaN are branch lengths, not taxon labels.
>>
>> What do you mean by 'delete' the NaNs? You can set them to zero, as
>> follows:
>>
>> tree$edge.length[is.na(tree$edge.length)]<-0
>>
>> but then you get a tree that looks like this:
>>
>> http://www.phytools.org/blog/Elvira-tree.png
>>
>> so it's kind of hard to tell which taxa are extinct
>>
>> Sorry I can't be of more help! All the best, Liam
>>
>> Liam J. Revell
>> University of Massachusetts Boston
>> Universidad Católica de la Ssma Concepción
>> web: http://faculty.umb.edu/liam.revell/, http://www.phytools.org
>>
>> Academic Director UMass Boston Chile Abroad:
>> https://www.umb.edu/academics/caps/international/biology_chile
>>
>> On 8/25/2020 9:00 PM, Elvira D`bastiani wrote:
>> > **[EXTERNAL SENDER]
>> >
>> > Hi Liam
>> >
>> > Also thank you for your informative answer. However, I don't want to
>> > simulate trees. For example one of my trees looks like this:
>> >
>> > (2:17, (3: 258, (4: 1, (5: 1, ((7:47, (8:13, ((11: 1, (13: 46,9: 60):
>> > 35): 35): 26, (10: 3, (12: 7, ((15: 4,14: 17): NaN, 6: NaN): - 904):
>> > 53): 0): 34): 31): NaN, 1: NaN): - 621): 1): 138): 53);
>> >
>> > Here, I need to delete these NaN from the phylo file.
>> >
>> > I believe that with these functions I am unable to delete these NaN or
>> > extract only the surviving species.
>> >
>> > Thank you for the help.
>> >
>> > Elvira.
>> >
>> > Em ter., 25 de ago. de 2020 às 17:33, Liam J. Revell
>> > mailto:liam.rev...@umb.edu>> escreveu:
>> >
>> > Dear Elvira.
>> >
>> > I'm not sure what's going on with the NaN values, but one issue that
>> I
>> > see with your example is that ape::drop.fossil identifies fossil
>> > lineages by finding those that end before the present. Since rtree
>> just
>> > splits the tree randomly & samples the edge lengths from a uniform
>> > distribution, this will leave you with only one lineage!
>> >
>> > To see how the function should work you could try something like:
>> >
>> > set.seed(99)
>> > tree<-pbtree(n=15,b=1,d=0.2)
>> > plotTree(tree)
>> > plotTree(pruned<-drop.fossil(tree))
>> >
>> > Hopefully this leaves things a bit clearer.
>> >
>> > For fun, also try:
>> >
>> > par(mfrow=c(1,1),mar=c(5.1,4.1,2.1,2.1),bty="n")
>> > ltt(tree,lty="dashed",log.lineages=FALSE,log="y")
>> > ltt(pruned,lty="solid",add=TRUE,log.lineages=FALSE,
>> >  log="y")
>> > legend("topleft",c("observed","reconstructed"),
>> >  lty=c("dashed","solid"),bty="n")
>> >
>> > All the best, Liam
>> >
>> > Liam J. Revell
>> > University of Massachusetts Boston
>> > Universidad Católica de la Ssma Concepción
>> > web: http://faculty.umb.edu/liam.revell/, http://www.phytools.org
>> > <
>> https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.phytools.org%2F=02%7C01%7Cliam.revell%40umb.edu%7Ce8848bf509ed41e5115408d8495b7db0%7Cb97188711ee94425953c1ace1373eb38%7C0%7C0%7C637340004615417539=5%2F1PtYD6DMW7gK7ef3QiRvXSeI0bozADU5lgrW9qPq0%3D=0
>> >
>> >
>> > Academic Director UMass Boston Chile Abroad:
>> > https://www.umb.edu/academics/caps/international/biology_chile
>> >
>> > On 8/25/2020 4:21 PM, Elvira D`bastiani wrote:
>> >  > [EXTERNAL SENDER]
>> >  >
>> >  > Hi,
>> >  >
>> >  > I'm a beginner with phylogeny analysis. I'm having trouble
>> > extracting only
>> >  > the surviving species from the phylogenies of my model.
>> >  >
>> >  > I used the drop.fossil (fPar) functions from ape and the
>> drop.extinct
>> >  > (fPar) from geiger. But it is not deleting the way I need it.
>> >  >
>> >  > Running example correctly:
>> >  > my.tree = rtree (15, rooted = TRUE, tip.label = NULL, br = runif)
>> > #random
>> >  > tree in phylo format
>> >  > class (my.tree)
>> >  > my.tree $ edge.length #I need this information without NaN.
>> >  >
>> >  >> my.tree$edge.length [1] 0.53879430 0.71471795

[R-sig-phylo] bug in boot.phylo

2020-07-19 Thread Emmanuel Paradis


Hi all,

The current version of ape on CRAN (5.4) has a bug in its function 
boot.phylo: all returned values are 0. The bug has been fixed; however, 
because it was in ape's internal C code, this requires a complete 
reinstallation of the package. A new version of ape with the version 
number 5.4-1 is in preparation and will be submitted to CRAN in the next 
few days.


In the mean time, it is possible to get the correct bootstrap values 
using the bootstrap trees and the function prop.clades:


out <- boot.phylo(phy, .., trees = TRUE)
prop.clades(phy, out$trees)

where 'phy' is the estimated tree.

A testing version of ape with the fix is available on ape-package.ird.fr.

Best,

Emmanuel

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Warning in ape::mst() in R 4.0

2020-07-01 Thread Emmanuel Paradis

Hi Kelly,

Thanks! This is fixed.

Best,

Emmanuel

- Le 1 Juil 20, à 23:20, Kelly Street street.ke...@gmail.com a écrit :

> Hello,
> 
> Thank you for all your work developing and maintaining this great package!
> This is likely a known issue already (and not a critical one), but since
> the update to matrices in R 4.0 (they now have class c("matrix", "array")),
> I have been getting warnings from ape::mst(). Specifically this:
> 
> Warning message:
> In if (class(X) == "dist") X <- as.matrix(X) :
> the condition has length > 1 and only the first element will be used
> 
> Which I think could be resolved by using is(X, "dist").
> 
> Thanks!
> Kelly Street
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Breaking polytomies such that all topologies are equiprobable

2020-06-29 Thread Emmanuel Paradis

- Le 27 Juin 20, à 16:53, Yan Wong y...@pixie.org.uk a écrit :
> This is extremely helpful, thanks Emmanuel. I think it might be useful to note
> this in the documentation for multi2di (or give a pointer to the description),
> as it wasn’t obvious to me how to find out this information from within R. 
> Even
> better would be an option that allowed equiprobable topologies to be generated
> if need be, although that’s obviously more work.

I'll add an option to both rtree() and multi2di().

> In my use case I could easily have polytomies with thousands or tens of
> thousands of edges, so the algorithmic approach is much more suitable than the
> enumerative one, and thanks for the tips on making it work. But on the topic 
> of
> enumeration, an extremely capable student of ours has been working out ways of
> using integer partitions to enumerate topologies, even for very large trees,
> and sampling from a list of tree ranks, then unranking the result, would
> probably be a lot more scalable than actually creating the trees using 
> allTrees

allTrees() does not accept n > 10 tips to prevent you filling your computer's 
memory.

> (our treatment is in Python, however, and we haven’t published it yet: details
> in https://tskit.readthedocs.io/en/latest/combinatorics.html). I’d be
> interested in knowing what prior art there is in this area, and if what our
> student has done is of any use to APE? The concepts scale quite well to
> mid-sized trees, and there are also separate applications to trees with
> millions of tips (which we occasionally have to deal with).

ape has howmanytrees() which computes the number of possible topologies for 
most cases: (un)labelled, (un)rooted, binary or not.

Ranking topologies seems related to previous works on matchings by Diaconis & 
Holmes (see the reference cited in ?as.matching). A related work is the 
geodesic distance implemented in the package distory (which I maintain now). 
The latter is more general since it considers branch lengths.

Maybe there is also a connection between tree ranking and topological distances 
(see ?nni in phangorn).

Best,

Emmanuel

> Cheers
> 
> Yan
> 
>> On 27 Jun 2020, at 10:33, Emmanuel Paradis  wrote:
>> 
>> Hi Yan,
>> 
>> multi2di() calls rtree() if random = TRUE. As you rightly guessed, the 
>> algorithm
>> used by this latter function is (described in APER2, p. 313):
>> 
>> 1. Draw randomly an integer a on the interval [1, n − 1]. Set b = n − a.
>> 2. If a > 1, apply (recursively) step 1 after substituting n by a.
>> 3. Repeat step 2 with b in place of a.
>> 4. Assign randomly the n tip labels to the tips.
>> 
>> This is indeed results in unequal frequencies of the possible topologies. If
>> step 1 "n − 1" is replaced by "floor(n/2)", then all topologies are generated
>> with equal probability.
>> 
>> Practically, I see two possibilities for you. If pratical, you generate a 
>> list
>> with all possible topologies, for instance with phangorn's function allTrees:
>> 
>> R> library(phangorn)
>> R> TR <- allTrees(4, rooted = TRUE)
>> R> TR
>> 15 phylogenetic trees
>> 
>> Then you just draw randomly some integers between 1 and 15 and use them as
>> indices:
>> 
>> R> s <- sample.int(length(TR), size = 1e5, replace = TRUE)
>> R> TR[s]
>> 10 phylogenetic trees
>> 
>> If you want, you can play with the option 'prob' of sample.int().
>> 
>> The other solution is to make copies of rtree() and multi2di() in your
>> workspace: only rtree() needs to be modified and only where sample.int() is
>> called in the recursive function foo. You also need to copy multi2di() even
>> unchanged because of the package's namespace.
>> 
>> HTH
>> 
> > Best,

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Breaking polytomies such that all topologies are equiprobable

2020-06-27 Thread Emmanuel Paradis

Hi Yan,

multi2di() calls rtree() if random = TRUE. As you rightly guessed, the 
algorithm used by this latter function is (described in APER2, p. 313):

1. Draw randomly an integer a on the interval [1, n − 1]. Set b = n − a.
2. If a > 1, apply (recursively) step 1 after substituting n by a.
3. Repeat step 2 with b in place of a.
4. Assign randomly the n tip labels to the tips.

This is indeed results in unequal frequencies of the possible topologies. If 
step 1 "n − 1" is replaced by "floor(n/2)", then all topologies are generated 
with equal probability.

Practically, I see two possibilities for you. If pratical, you generate a list 
with all possible topologies, for instance with phangorn's function allTrees:

R> library(phangorn)
R> TR <- allTrees(4, rooted = TRUE)
R> TR
15 phylogenetic trees

Then you just draw randomly some integers between 1 and 15 and use them as 
indices:

R> s <- sample.int(length(TR), size = 1e5, replace = TRUE)
R> TR[s]
10 phylogenetic trees

If you want, you can play with the option 'prob' of sample.int().

The other solution is to make copies of rtree() and multi2di() in your 
workspace: only rtree() needs to be modified and only where sample.int() is 
called in the recursive function foo. You also need to copy multi2di() even 
unchanged because of the package's namespace.

HTH

Best,

Emmanuel

- Le 27 Juin 20, à 4:46, Yan Wong y...@pixie.org.uk a écrit :

> I’m trying to figure out how to randomly resolve polytomies such that there is
> an equal probability of any topology being generated. I thought that the ape
> function “multi2di” did this, but when I have tried it repeatedly with a
> 4-tomy, multi2di seems to produce the 3 balanced trees [((a,b),(c,d)) ;
> ((a,c),(b,d)); ((a,d),(b,c))] twice as often as the 12 possible unbalanced
> dichotomous 4-tip rooted topologies. The R code I’ve used to produce the 
> sample
> topologies is something like this:
> 
> do.call(c, lapply(1:10, function(x) 
> multi2di(starTree(c('a','b','c','d')
> 
> Firstly, is this expected, or am I doing something wrong (if expected, it 
> would
> be useful to note this in the docs)? Secondly, is there an function somewhere
> that *will* break polytomies to produce equiprobable topologies? If not,
> thirdly, is there an algorithm that will do this? I think the standard
> “repeatedly pick 2 random edges from the polytomy node and pair them off”
> results in the non-equiprobable distribution that I find using multi2di. I
> think I’ve found a similar problem with the Dendropy algorithm, which does
> claim to result in equiprobable topologies, and have posted to their mailing
> list in case I’m misunderstanding something.
> 
> Cheers
> 
> Yan Wong
> Big Data Institute, Oxford University
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] collapsing sets of nodes based on label values

2020-05-22 Thread Emmanuel Paradis

Hi,

Brian is 100% right about interpreting bootstrap values. When the tree is input 
into R with read.tree() or read.nexus(), these values are attached to the nodes 
but they relate to their subtending branches. The function root() has the 
option 'edgelabel' to specify whether these node labels must be linked to the 
branches rather than the nodes (see the paper 
https://doi.org/10.1093/molbev/msx055 for details on this issue). The default 
is FALSE, so use root(, edgelabel = TRUE) if needed (see also 
?drawSupportOnEdges in ape). Once the tree is rooted, then it can be 
represented in the conventional way with "node support values". You can then 
identify the nodes with respect to their labels, something like (tr being the 
tree):

target <- which(as.numeric(tr$node.label) < 10)

This assumes that the node labels are like "100", "99", ... Otherwise, 
as.numeric() will give you a warning. Of course, if these are proportions 
instead of percentages then 10 should be 0.1. Then, find the branches in the 
tree where these nodes are ancestors (which makes sense now since you rooted 
the tree previously) and set their lengths to zero:

i <- match(target, tr$edge[, 1])
tr$edge.length[i] <- 0

If you plot tr with the default options, these branches will appear as 
collapsed. You may also do tr <- di2multi(tr) which will delete these 
zero-length branches from the tree and will, of course, delete some of the node 
labels.

HTH.

Best,

Emmanuel

- Le 22 Mai 20, à 21:58, Brian O'Meara omeara.br...@gmail.com a écrit :
> You might need to reindex each time, something like (pseudocode):
> 
> bad.nodes <- function_to_get_node_numbers_with_support_less_than_90(phy)
> while(length(bad.nodes>0)) {
>   phy <- collapse.to.star(phy, bad.nodes[1])
>   bad.nodes <- function_to_get_node_numbers_with_support_less_than_90(phy)
> }
> 
> But this will collapse the bad node and all its descendants. I don't see a
> function that just collapses at a node and leaves descendants intact, but
> this may be what you're looking for (the stuff that ape::collapse.singles
> does is similar, and could perhaps be modified).
> 
> One pedantic point -- we often talk about node support, collapsing nodes,
> etc., but it's really about the branch: the node doesn't have 83% bootstrap
> support but the bipartition created by the edge subtending it does. For
> example, if the tree is (A,(B,(C,(D,E, if the support for the CDE clade
> is 83%, it's actually the proportion of trees that have a split between AB
>| CDE, so the branch separating the CDE clade from everything else. The
> node itself there has one branch coming in (that comes from the rest of the
> tree containing A and B) and two branches coming out (one going to C, one
> going to DE). We could summarize trees by actual node support: how many
> times we have a node that has an edge going to AB, an edge going to C, and
> an edge going to DE, but we typically don't. [For example, the branch could
> separate AB | CDE, and the node is AB | C | DE, but one could also have a
> node that is AB | CD | E for the same bipartition on the subtending edge].
> I only bring it up here because if you end up having to write code to do
> collapsing, it can be important to remember that what is actually being
> collapsed is the edge subtending the labeled node because it is the the
> edge with low support.
> 
> Best,
> Brian
> 
> ___
> Brian O'Meara
> 
> Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville
> Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville
> He/Him/His
> 
> 
> 
> On Fri, May 22, 2020 at 9:42 AM Karla Shikev  wrote:
> 
>> Dear all,
>>
>> I'm surprised I could not find a simple function to do this in any of the
>> packages I know. I just want to take a tree with node labels (e.g.
>> bootstrap values from a RAxML analysis) and collapse all nodes with support
>> values below, say, 90%. I tried to do this recursively using
>> collapse.to.star in phytools, but node numbers change after the first
>> collapsed node and my indexing gets completely off.
>>
>> with best regards,
>>
>> Karla
>>
>> [[alternative HTML version deleted]]
>>
>> ___
>> R-sig-phylo mailing list - R-sig-phylo@r-project.org
>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
>> Searchable archive at
>> http://www.mail-archive.com/r-sig-phylo@r-project.org/
>>
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Improves haplotype() function in pegas 0.13

2020-05-14 Thread Emmanuel Paradis

Hi Jarrett,

I'm Cc'ing to r-sig-genetics since we had a recent discussion on a similar 
topic (see below).

- Le 14 Mai 20, à 9:59, Jarrett Phillips phillipsjarre...@gmail.com a écrit 
:
> Hello,
> 
> Emmanuel Paradis has recently updated the haplotype() function in pegas
> 0.13 to account for base ambiguities, gaps and Ns. Thank you Emmanuel!

Base ambiguities were already considered in previous versions of pegas, but it 
was not explicit (or flexible).

> The argument  'strict' simply considers or ignores all gaps and
> ambiguities, but does this also consider/ignore Ns?

Yes. 'strict' means "strict interpretation of the characters without 
interpreting them as base ambiguities". For instance, consider the following 9 
aligned sequences with 2 sites (without labels for simplicity):

AA
AR
AM
AW
AV
AH
AD
AN
A-

By default (and with the last version of pegas), haplotype() will return a 
single haplotype because it cannot be inferred whether any of the sequences 2-9 
is different from the first one. If strict = TRUE, nine haplotypes will be 
returned.

> The 'trailingGapsAsN' simply treats leading and trailing gaps as Ns,
> ignoring internal gaps. This argument is set to TRUE by default.
> 
> From the above, it appears that  'strict' ignores Ns. If 'strict' is set to
> TRUE, does this mean that TRUE/FALSE assignment 'trailingGapsAsN' is ignored
> as well?

Yes. I've added a line in the help page of haplotype() to say that 
'trailingGapsAsN' has no effect if 'strict = TRUE'.

> The reason I ask is because I use haplotype() in one of my R packages to
> compute optimal sample sizes for genetic diversity assessment (HACSim).
> Currently in my package, R throws a warning to users if missing data or
> base ambiguities are present within DNA alignments.
> 
> Given Emmanuel's changes, it seems the warning in my package will not be
> needed once I set 'strict = TRUE'. I am unsure however on how to properly
> set 'trailingGapsAsN' to ensure that gaps do not affect haplotype
> calculation if they are left in the alignment. Gaps, ambiguities and Ns
> will cause an overestimation of haplotypes, and therefore an inflation of
> standing genetic variation.

Maybe the discussion we had on r-sig-genetics could be relevant here. There 
doesn't seem to be an easy answer to these questions.

Also, the coming version of ape will include the new function latag2n (Leading 
and Trailing Alignment Gaps to N) which changes sequences such as "A-C-" into 
"A-CN".

Cheers,

Emmanuel

> Can someone weigh in on this?
> 
> Thanks!
> 
> Cheers,
> 
> Jarrett
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Question on min(D.eig$values) in pcoa function of ape in R

2020-04-06 Thread Emmanuel Paradis

Hi Coline, 

Matrix symmetry is not about row- and colnames but about testing, for a square 
matrix X, if X[i, j] == X[j, i] for all i and j = 1, ..., n. I've just had a 
look at the code of isSymmetrix.matrix() and this is done by testing equality 
of X and t(X), so effectively making a copy of X which seems to explain why you 
run out of memory for this test (apart from the fact that you may have 
memory-hungry applications running on your computer...) 

You may try this instead: 

matrix_dissim_Eucl <- as.matrix(mat_dissim_Eucl) 
rm(mat_dissim_Eucl) 
gc() 
isSymmetric(matrix_dissim_Eucl) 

If you still run out of memory, try to quit some applications (Web browsers of 
Office typically use a lot of memory). 

Anyway, it doesn't change the fact that daisy(), like dist(), returns only the 
lower triangle of the distance matrix, so as.matrix() should return a symmetric 
matrix with its diagonal set to 0. So the above check is just to be sure. 

Best, 

Emmanuel 

- Le 6 Avr 20, à 22:00, Coline Boonman  a écrit : 

> Dear Emmanual,

> Thank you for your response! I have checked your code, and I get the 
> following:

> matrix_dissim_Eucl<-as.matrix(mat_dissim_Eucl)

> any(is.complex(matrix_dissim_Eucl))

> #FALSE

> isSymmetric.matrix(matrix_dissim_Eucl)

> # Error: cannot allocate vector of size 811.1Mb

> So I checked by hand (I think it checks for row and column length, and for 
> equal
> row and column names

> dim(matrix_dissim_Eucl)

> #14582 14582

> summary(colnames(matrix_dissim_Eucl)==row.names(matrix_dissim_Eucl))

> #TRUE 14582

> So I guess this means that it is a symmetric matrix that enters the pcoa
> function, and that there are no complex values in the matrix.

> I hope there is one thing I did not think of yet, and you can help me figure 
> it
> out. In any case, thank you for your help. I really appreciated it.

> Kind regards,

> Coline

> From: Emmanuel Paradis [mailto:emmanuel.para...@ird.fr]
> Sent: Monday, 6 April 2020 16:47
> To: Coline Boonman 
> Cc: r-sig-phylo 
> Subject: Re: [R-sig-phylo] Question on min(D.eig$values) in pcoa function of 
> ape
> in R

> Hi Coline,

> Actually the test would be:

> any(is.complex(mat_dissim_Eucl))

> And you may also test:

> isSymmetric(as.matrix(mat_dissim_Eucl))

> But it'd very surprising this 2nd test returns FALSE.

> So if your matrix is real and symmetric, I don't see why you have complex
> eigenvalues from its decomposition.

> Best,

> Emmanuel

> - Le 6 Avr 20, à 21:16, Coline Boonman < [ mailto:c.boon...@science.ru.nl 
> |
> c.boon...@science.ru.nl ] > a écrit :

>> Dear Emmanual,

>> Thank you for your email. I have been trying to understand what is going on 
>> and
>> if your suggestion is correct. However, if I test for complex numbers in my
>> dissimilarity matrix (converted to a matrix:
>> is.complex(as.matrix(mat_dissim_Eucl)) ) R gives me the result FALSE. That
>> means that there are no complex numbers in the input for the pcoa, right?

>> I hope you or someone else has another suggestion on why I get this error.

>> Kind regards,

>> Coline

>> From: Emmanuel Paradis [ [ mailto:emmanuel.para...@ird.fr |
>> mailto:emmanuel.para...@ird.fr ] ]
>> Sent: Thursday, 2 April 2020 07:43
>> To: Coline Boonman < [ mailto:c.boon...@science.ru.nl | 
>> c.boon...@science.ru.nl
>> ] >
>> Cc: r-sig-phylo < [ mailto:r-sig-phylo@r-project.org | 
>> r-sig-phylo@r-project.org
>> ] >
>> Subject: Re: [R-sig-phylo] Question on min(D.eig$values) in pcoa function of 
>> ape
>> in R

>> Hi Coline,

>> This is strange: you calculate a distance matrix with daisy(), so the
>> as.matrix() operation should return a symmetric matrix. Symmetric (real)
>> matrices have all their eigenvalues real numbers. Maybe some complex values
>> were produced by daisy()?

>> Best,

>> Emmanuel

>> - Le 30 Mar 20, à 22:58, Coline Boonman < [ 
>> mailto:c.boon...@science.ru.nl |
>> c.boon...@science.ru.nl ] > a écrit :

>>> Dear reader,

>>> I am working with the pcoa function of ape in R and I got the error message:
>>> “min(D.eig$values) : invalid 'type' (complex) of argument” .

>>> My input data is a distance matrix, which pcoa first converts to a matrix 
>>> in the
>>> first line within the code of the function. The error occurs when it wants 
>>> to
>>> check for negative eigenvalues at the line ‘ min.eig <- min(D.eig$values)’

>>> I checked D.eig$values and indeed they are complex numbers (imaginary 
>>> numbers).
>>> What I don’t understand is why these are created. I never had any issues 
>>

Re: [R-sig-phylo] Question on min(D.eig$values) in pcoa function of ape in R

2020-04-06 Thread Emmanuel Paradis

Hi Coline, 

Actually the test would be: 

any(is.complex(mat_dissim_Eucl)) 

And you may also test: 

isSymmetric(as.matrix(mat_dissim_Eucl)) 

But it'd very surprising this 2nd test returns FALSE. 

So if your matrix is real and symmetric, I don't see why you have complex 
eigenvalues from its decomposition. 

Best, 

Emmanuel 

- Le 6 Avr 20, à 21:16, Coline Boonman  a écrit : 





Dear Emmanual, 



Thank you for your email. I have been trying to understand what is going on and 
if your suggestion is correct. However, if I test for complex numbers in my 
dissimilarity matrix (converted to a matrix: 
is.complex(as.matrix(mat_dissim_Eucl)) ) R gives me the result FALSE. That 
means that there are no complex numbers in the input for the pcoa, right? 



I hope you or someone else has another suggestion on why I get this error. 



Kind regards, 

Coline 




From: Emmanuel Paradis [mailto:emmanuel.para...@ird.fr] 
Sent: Thursday, 2 April 2020 07:43 
To: Coline Boonman  
Cc: r-sig-phylo  
Subject: Re: [R-sig-phylo] Question on min(D.eig$values) in pcoa function of 
ape in R 





Hi Coline, 





This is strange: you calculate a distance matrix with daisy(), so the 
as.matrix() operation should return a symmetric matrix. Symmetric (real) 
matrices have all their eigenvalues real numbers. Maybe some complex values 
were produced by daisy()? 





Best, 





Emmanuel 





- Le 30 Mar 20, à 22:58, Coline Boonman < [ mailto:c.boon...@science.ru.nl 
| c.boon...@science.ru.nl ] > a écrit : 


BQ_BEGIN


Dear reader, 



I am working with the pcoa function of ape in R and I got the error message: 
“min(D.eig$values) : invalid 'type' (complex) of argument” . 

My input data is a distance matrix, which pcoa first converts to a matrix in 
the first line within the code of the function. The error occurs when it wants 
to check for negative eigenvalues at the line ‘ min.eig <- min(D.eig$values)’ 

I checked D.eig$values and indeed they are complex numbers (imaginary numbers). 
What I don’t understand is why these are created. I never had any issues with 
the pcoa function before, and now I change my dataset and this happens. The 
distance matrix values range from 0 to 47 and I have species names as rownames 
and column names. In the attachment I include a small piece of the dataset and 
the code I run. I checked when the error occurs, and I don’t see why the extra 
line (2063) gives the error. 



I hope someone can help me. 



Kind regards, 

Coline Boonman 

-- 

PhD student 

Department of Environmental Science 

Faculty of Science 

Radboud University Nijmegen 



P.O. Box 9010, NL-6500 GL Nijmegen 

E: [ mailto:c.boon...@science.ru.nl | c.boon...@science.ru.nl ] 

T: + 31 (0)243653270 / + 31 (0)243653281 (secr.) 



[ http://www.ru.nl/environmentalscience | http://www.ru.nl/environmentalscience 
] 










___ 
R-sig-phylo mailing list - [ mailto:R-sig-phylo@r-project.org | 
R-sig-phylo@r-project.org ] 
[ https://stat.ethz.ch/mailman/listinfo/r-sig-phylo | 
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo ] 
Searchable archive at [ http://www.mail-archive.com/r-sig-phylo@r-project.org/ 
| http://www.mail-archive.com/r-sig-phylo@r-project.org/ ] 




BQ_END



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Question on min(D.eig$values) in pcoa function of ape in R

2020-04-01 Thread Emmanuel Paradis

Hi Coline, 

This is strange: you calculate a distance matrix with daisy(), so the 
as.matrix() operation should return a symmetric matrix. Symmetric (real) 
matrices have all their eigenvalues real numbers. Maybe some complex values 
were produced by daisy()? 

Best, 

Emmanuel 

- Le 30 Mar 20, à 22:58, Coline Boonman  a écrit : 

> Dear reader,

> I am working with the pcoa function of ape in R and I got the error message:
> “min(D.eig$values) : invalid 'type' (complex) of argument” .

> My input data is a distance matrix, which pcoa first converts to a matrix in 
> the
> first line within the code of the function. The error occurs when it wants to
> check for negative eigenvalues at the line ‘ min.eig <- min(D.eig$values)’

> I checked D.eig$values and indeed they are complex numbers (imaginary 
> numbers).
> What I don’t understand is why these are created. I never had any issues with
> the pcoa function before, and now I change my dataset and this happens. The
> distance matrix values range from 0 to 47 and I have species names as rownames
> and column names. In the attachment I include a small piece of the dataset and
> the code I run. I checked when the error occurs, and I don’t see why the extra
> line (2063) gives the error.

> I hope someone can help me.

> Kind regards,

> Coline Boonman

> --

> PhD student

> Department of Environmental Science

> Faculty of Science

> Radboud University Nijmegen

> P.O. Box 9010, NL-6500 GL Nijmegen

> E: c.boon...@science.ru.nl

> T: + 31 (0)243653270 / + 31 (0)243653281 (secr.)

> http://www.ru.nl/environmentalscience

> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] BD process in Nee et al. (1994)

2020-03-14 Thread Emmanuel Paradis

Hi Karla,

ape has several functions to simulate trees that make use of these formulas 
(see ?rphylo). However, the formulas you are looking for may not be directly 
accessible from R, so you need to check the sources of ape. You may also need 
to use the last testing version of ape since a bug was recently fixed in 
rphylo():

http://ape-package.ird.fr/ape_installation.html#versions

Best,

Emmanuel

- Le 14 Mar 20, à 22:23, Karla Shikev karlashi...@gmail.com a écrit :

> Dear all,
> 
> Does anyone know of a function in R which would provide the analytical
> solution to the expectation for the number of species over time derived by
> Nee et al. (1994, Phil. Trans. R. Soc. Lond B.)?
> 
> Thanks,
> 
> Karla
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Iterating though multiple FASTA files via rbind.DNAbin

2020-03-12 Thread Emmanuel Paradis

Hi Jarrett,

read.FASTA() always returns a list. So you may do something (quite general) 
like:

fls <- dir(pattern = "\\.fas$|\\.fasta$", ignore.case = TRUE) # add more file 
extensions if needed
X <- lapply(fls, read.FASTA)
seqlen <- lengths(X)
if (length(unique(seqlen)) == 1) X <- as.matrix(X)

If the sequences are not of the same length, you can use the vector 'seqlen' 
for further processing, for instance to remove the shortest ones (if this makes 
sense):

X[seqlen > 100]

Also I found the function fasta.index (in Biostrings on BioConductor) to be 
very useful for this kind of tasks: it scans a bunch of FASTA files (possibly 
in different directories) and returns a data frame with each row describing 
each sequence (length, label, path, ...).

HTH

Best,

Emmanuel

- Le 12 Mar 20, à 22:18, Jarrett Phillips phillipsjarre...@gmail.com a 
écrit :
> Hi All,
> 
> I have a folder with multiple FASTA files which need to be read into R.
> 
> To avoid file overwriting, I use ape::rbind.DNAbin() as follows:
> 
> file.names <- list.files(path = envr$filepath, pattern = ".fas")
>  tmp <- matrix()
>  for (i in 1:length(file.names)) {
>seqs <- read.dna(file = file.names[i], format = "fasta")
>seqs <- rbind.DNAbin(tmp, seqs)
>  }
> 
> When run however, I get an error saying that the files do not have the same
> number of columns (i.e., alignments are all not of the same length).
> 
> How can I avoid this error. I feel that it's a basic fix, but one that is
> not immediately obvious to me.
> 
> Thanks!
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] PEGAS: assignment to haplotype when missing information

2020-02-26 Thread Emmanuel Paradis

Hi Hirra,

The assignment is not random, it follows the order of the sequences in the data:

- Seqs. A and B are compared and found to be identical so they are both 
assigned to haplotype I.
- Seq. C is compared to haplotype I (effectively seq. A) and found to be 
different so it is assigned to haplotype II.
- Seq. D is compared to haplotype I and found to be similar and so assigned to 
haplotype I.

If you reorder your data and put Seq. C first, you'd obtain that C and D are 
assigned to the same haplotype. The same issue occurs with ambiguous bases.

These situations certainly deserve to have an option to haplotype() to handle 
them properly.

Best,

Emmanuel

- Le 25 Fév 20, à 19:31, Hirra Farooq 
hirra.far...@postgrad.manchester.ac.uk a écrit :

> Hello,
> 
> I am using the pegas R package to assign sequences into haplotypes.
> 
> I recently tried out a test examples with 4 sequences. 2 of the sequences (A 
> and
> B) are identical, 1 sequence (Seq C) differs from these at only one position
> (pos 648).
> The 4th sequence (Seq D) is identical to all but shorter so has no residues at
> the determinant position 648. (See image below)
> 
> So correctly pegas assigns A and B to haplotype I and C to haplotype II. 
> However
> it also assigns D to I, despite there being no information at which residue is
> at the determinant position.
> 
> I just wanted to know in such cases as D when there is missing information, 
> does
> pegas just randomly assign to a haplotype?
> 
> 
> 
> 
>aln (633..663)  names
> [A] CCCGAATATCAACATTTATTT--
> [D] CCCGA--
> [B] CCCGAATATCAACATTTATTT--
> [C] CCCGAATATCACCATTTAGATTT
> 
> 
> Thanks and best wishes,
> Hirra
> University of Manchester Student.
> 
>   [[alternative HTML version deleted]]
> 
> ___
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] A test for valid phylo object (ape package) for package developers

2019-11-07 Thread Emmanuel Paradis


Hi,

To answer Elizabeth's question: there is no function that does what 
you asked for in your first message. Liam's solution is certainly the 
best solution for the moment. I'll have a look at it later. The code 
checkValidPhylo() needs to be dusted off a bit: the diagnostic " 
MODERATE: some nodes are of degree 1 or less" is no more useful since 
nodes of degree 1 are now supported in ape.


Best,

Emmanuel

Tue, 5 Nov 2019 14:45:58 -0800 Elizabeth Purdom 
:

Thank you very much, I’ll try that!


On Nov 5, 2019, at 2:20 PM, Liam Revell  wrote:

Hi Elizabeth.

This does not do exactly what you want, but it is possible to use 
capture.output() to grab the printout of checkValidPhylo() and then 
grep() to see if it contains instances of "MODERATE" or "FATAL" 
errors.


The following simple function does that. It returns 0 if neither 
MODERATE nor FATAL errors are detected in the "phylo" object, 1 if 
at 
least one MODERATE error (but no FATAL errors) is detected, and 2 if 
any 
FATAL errors are detected:


chk.phylo<-function(x){
object<-capture.output(checkValidPhylo(x))
if(length(grep("FATAL",object))>0) return(2)
else if(length(grep("MODERATE",object))>0) return(1)
else return(0)
}

E.g.:

library(ape)

t1<-rtree(n=10)

chk.phylo(t1) ## should return 0

t2<-t1
t2$Nnode<-9.5

chk.phylo(t2) ## should return 1

t3<-t1
t3$edge<-t3$edge[-4,]

chk.phylo(t3) ## should return 2

All the best, Liam

Liam J. Revell
Associate Professor, University of Massachusetts Boston
Profesor Asistente, Universidad Católica de la Ssma Concepción
web: http://faculty.umb.edu/liam.revell/, http://www.phytools.org

Academic Director UMass Boston Chile Abroad (starting 2019):
https://www.umb.edu/academics/caps/international/biology_chile

On 11/5/2019 5:54 PM, Elizabeth Purdom wrote:

[EXTERNAL SENDER]

Hello,

I am working with the phylo object from the ape package in my own 
package in which I am manipulating the trees. I would like to check 
that I have successfully created a valid ape object, but the 
`checkValidPhylo` function appears to be solely interactive — it 
prints out a display, and always returns NULL.


Is there a function in the ape package (or can there be?) that would 
do the checks, but return the results in a way that I can then 
process in my function? (e.g. return a vector of each of the checks 
as TRUE/FALSE) And I don’t want anything printed out, since I don’t 
want that output to be printed for users of my function).


I see the package `paleotree` has ported some of those checks into a 
test, so I’m guessing such a function doesn’t exist in ape — and I 
don’t really want a dependency on another package just for these 
checks.


Thanks,
Elizabeth Purdom
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://nam01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.ch%2Fmailman%2Flistinfo%2Fr-sig-phylodata=02%7C01%7Cliam.revell%40umb.edu%7C6584f4bb63dd48f9205008d762326164%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637085840853226654sdata=48LVGO%2FHHvX0rpMy6FaVgIhoeV6UO8Ouc%2B6FUfJcFCM%3Dreserved=0
Searchable archive at 
https://nam01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mail-archive.com%2Fr-sig-phylo%40r-project.org%2Fdata=02%7C01%7Cliam.revell%40umb.edu%7C6584f4bb63dd48f9205008d762326164%7Cb97188711ee94425953c1ace1373eb38%7C0%7C1%7C637085840853226654sdata=qgug0JIrKlC9fpr8lqrMuBku2XpHV3hfbzPF0m8ByuI%3Dreserved=0




___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at 
http://www.mail-archive.com/r-sig-phylo@r-project.org/







___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Recursive addition along a tree

2019-08-05 Thread Emmanuel Paradis

Hi Mario,

In the class "phylo", nodes and tips are numbered sequentially so it
is no problem to have data associated with both. This is done in many
functions in ape and in phangorn. I think the solution to your
headaches is in this document (§ 4.4):

http://ape-package.ird.fr/misc/FormatTreeR_24Oct2012.pdf

HTH

Best,

Emmanuel

Sun, 4 Aug 2019 03:32:16 +0200 Mario Schädel
:

Dear people on this list,

I am a German PhD student in the field of arthropod palaeontology. I
want
to develop a new approach for comparing abundance data in a
phylogenetic
context. Unlike following the standard techniques of community
ecology, I
want to include the abundance data not only for the tips of a tree
but also
for its nodes. The background for this is, that in palaeontology it
is
often not possible to identify specimens to species level and also
some
fossils are better preserved as others, hence the level of
identification

within samples can vary a lot.

So, I want to link a vector, or at best a data frame, that holds
abundance
data for tips and nodes, to a tree (even this proved to be
problematic as
all functions that I found make distinctions between tip and node
data). In
the next step, I would like to recursively sum-up the abundances
along the
tree, from the leaves to the root. When the data is properly linked,
this

should not be too complicated, once the tree is reordered into the
“postorder” format. Afterwards, the computed data should be
converted in a
format, that can be included into plots. Alternatively, I also came
up with
the idea, that the abundance data could be visualised by the branch
lengths

of the tree.

I would be more than grateful, if someone could offer me some help
with
this project. I have been wrecking my head around this since months
now.

I attached a graphical representation of my idea in case my
explanations
are a bit confusing (orange color, abundances before computation;
green,

summed up aundances).

Best wishes to all of you!

Mario

Pour nous remonter une erreur de filtrage, veuillez vous rendre ici
:
https://www.security-mail.net/ndc/reporter/mid/2641.5d4635bf.476a1.0/r/emmanuel.para...@ird.fr/c/402709068db61d2ee7dfb2c7aaff9a71ffa8b19c

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] comparePhylo

2019-06-13 Thread Emmanuel Paradis


Hi Saleh & Will,

I'm not sure whether phytools::cophylo does the same thing than
ape::comparePhylo (maybe rather ape::cophyloplot).

comparePhylo() is actually aimed for data exploration/summary rather
than for producing publication-ready graphs. That said, there is a
trick to play with the character sizes when you write the plot into a
PDF file, eg:

pdf("plot.pdf", w = 10, h = 10)
par(xpd = TRUE) # could be useful if w and h are small
comparePhylo(. plot = TRUE)
dev.off()

You can change the values of 'w' and 'h' in the call to pdf() to
change the size of the characters (the smaller the values, the larger
the characters). Of course, this will affect all characters which may
not be what you want. You can have total control by plotting the trees
separately, typically after layout(matrix(1:2, 1)), and using the
option 'edge.width' in plot.phylo (or edge.col if you want to use
colours). If you go to this solution, you may find the branch numbers
(indices) of your trees with:

plot(tree)
edgelabels()

This will help you to create the vectors given to edge.width/edge.col.

HTH

Best,

Emmanuel

Thu, 13 Jun 2019 09:30:02 -0700 William Gearty :

Hi Saleh,

Try using phytools::cophylo() instead. It has far more options for
plotting. Documentation is here:
https://www.rdocumentation.org/packages/phytools/versions/0.6-60/topics/cophylo
.

Best,
Will

On Thu, Jun 13, 2019 at 6:38 AM Saleh Rahimlouye Barabi <
saleh.rahim...@ut.ee> wrote:


Hello,

I'm using comparePhylo() function from APE package in R. I want to 
know is
there anyway to change the tip labels size when plotting it? 
Arguments like 'cex' does not work in this case.


Best
Saleh Rahimlou
Ph.D. Candidate
Department of Botany and Ecology
University of Tartu
14A Ravila, 50411 Tartu
Estonia
Email: saleh.rahim...@ut.ee


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/




--
William Gearty
PhD Candidate, Paleobiology
Department of Geological Sciences
Stanford School of Earth, Energy & Environmental Sciences
williamgearty.com

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at 
http://www.mail-archive.com/r-sig-phylo@r-project.org/







___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[R-sig-phylo] new package phylobench

2019-03-21 Thread Emmanuel Paradis


Hi all,

The package 'phylobench' has recently been released on GitHub. It aims 
to provide tools for "phylogenetic benchmarking" by performing data 
analyses, tests, or simulations using functions in other packages, and 
comparing the outputs with expected or predicted values. Currently, 12 
benchmarks are implemented, and others can easily be programmed. 
phylobench was used to validate the last submission of ape on CRAN 
(version 5.3). More on GitHub:


https://github.com/emmanuelparadis/phylobench

Comments, contributions, forks are all welcome.

Emmanuel

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Problem date consensus phylogeny

2019-01-25 Thread Emmanuel Paradis


Le 25/01/2019 à 00:14, Alina van dijk a écrit :

Hi everyone,

Thanks Emmanuel and Joseph for sharing your thoughts!
Emmanuel, today I tried multi2di and it works. But, if I understand 
correctly, this function transforms the dichotomies with branches of 
length zero. So for me, in fact, doesn't solve the politomy. Right?


To see what multi2di() does, try this code:

tr <- compute.brlen(stree(10), 1)
tr2 <-  multi2di(tr)
par(mfcol = c(2, 2))
plot(tr)
plot(tr, use.edge.length = FALSE)
plot(tr2)
plot(tr2, use.edge.length = FALSE)

The internal branches that appear on the last plot have zero-length. 
Note the option random = TRUE in multi2di().


I hope this can help you with your headache(s)...

Best,

Emmanuel

Joseph, I'm only a begginer with R and this program already gives me a 
lot of headache.

It drives me crazy think in another program... hahahha
Just kidding! :)
But, nice to know that I have a plan B!
I will take a look!

Thanks in advance,
Best Regards,

Alina


Em qua, 23 de jan de 2019 às 16:29, Emmanuel Paradis 
mailto:emmanuel.para...@ird.fr>> escreveu:


Hi Alina,

Did you try multi2di() to remove polytomies?

Best,

Emmanuel

Le 23/01/2019 à 17:41, Alina van dijk a écrit :
 > Hi everyone,
 >
 > My name is Alina, I'm attending master degree in ecology and I have a
 > little problem with my consensus phylogeny. I used 1000 phylogeny of
 > birdtree to construct the consensus tree but now I need to insert
the dates
 > in this consensus tree.
 >
 > To accomplish this idea, I tried different methods:
consensustree<-averageTree
 > (tree,method="symmetric.difference") but the result is a
consensus tree in
 > topology, binary and rooted but not ultrametric. For this problem
I used
 > consensus_tree_ultrametric
 > <-compute.brlen(consensustree,method="Grafen",power=1), but the edge
 > lengths was altered.
 >
 > So, I used another method to make a consensus tree using phytools
 > consensustree_another_method <- consensus.edges(multiphylo,method =
 > "mean.edge") as a result, a ultrametric tree, rooted with
consensus edges
 > lengths, but have polytomies :(
 >
 > I also tried using phylobase , proposed by Brian O'Meara in this
link:
 >

https://grokbase.com/t/r/r-sig-phylo/12bn9x5gv4/why-no-branch-lengths-on-consensus-trees
 > ,
 >   but as a result, my consensus tree wasn´t ultrametric and the
 > transformation also altered the edge lengths.
 >
 > I only need to insert the dates based on the trees dated by Jetz
in this
 > consensus tree and as result, an ultrametric tree, with consensus
edges
 > lengths and no polytomy.
 >
 > Does anyone know how to solve that problem?
 >
 > Thanks in advance,
 > Best Regards,
 >
 > Alina
 >
 >       [[alternative HTML version deleted]]
 >
 > ___
 > R-sig-phylo mailing list - R-sig-phylo@r-project.org
<mailto:R-sig-phylo@r-project.org>
 > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 > Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/
 >
 >
 > Pour nous remonter une erreur de filtrage, veuillez vous rendre
ici : http://f.security-mail.net/VKQxzoa2
 >
 >





___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Problem date consensus phylogeny

2019-01-23 Thread Emmanuel Paradis


Hi Alina,

Did you try multi2di() to remove polytomies?

Best,

Emmanuel

Le 23/01/2019 à 17:41, Alina van dijk a écrit :

Hi everyone,

My name is Alina, I'm attending master degree in ecology and I have a
little problem with my consensus phylogeny. I used 1000 phylogeny of
birdtree to construct the consensus tree but now I need to insert the dates
in this consensus tree.

To accomplish this idea, I tried different methods: consensustree<-averageTree
(tree,method="symmetric.difference") but the result is a consensus tree in
topology, binary and rooted but not ultrametric. For this problem I used
consensus_tree_ultrametric
<-compute.brlen(consensustree,method="Grafen",power=1), but the edge
lengths was altered.

So, I used another method to make a consensus tree using phytools
consensustree_another_method <- consensus.edges(multiphylo,method =
"mean.edge") as a result, a ultrametric tree, rooted with consensus edges
lengths, but have polytomies :(

I also tried using phylobase , proposed by Brian O'Meara in this link:
https://grokbase.com/t/r/r-sig-phylo/12bn9x5gv4/why-no-branch-lengths-on-consensus-trees
,
  but as a result, my consensus tree wasn´t ultrametric and the
transformation also altered the edge lengths.

I only need to insert the dates based on the trees dated by Jetz in this
consensus tree and as result, an ultrametric tree, with consensus edges
lengths and no polytomy.

Does anyone know how to solve that problem?

Thanks in advance,
Best Regards,

Alina

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : 
http://f.security-mail.net/VKQxzoa2




___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] exponential time with write.tree

2019-01-23 Thread Emmanuel Paradis


Hi Jérémie,

You're correct: the current version of write.tree() scales with n^2. 
This seems to be related to calling the generic '[[' operator. You can 
modify the code with the usual fix(write.tree), then insert these two lines:


phy <- .uncompressTipLabel(phy)
class(phy) <- NULL

right before the 'for' loop, save and close. You may need to use the 
modified code once or twice (with a small tree) so that R might 
byte-compile it. I've tried it with the 2027025 trees generared by 
allTrees(9,rooted=T) and it took 8.5 mins (instead of the predicted 28 
hrs with the current code).


Best,

Emmanuel

Le 23/01/2019 à 10:51, Jérémie Bardin a écrit :

Hi all,
I tried to export 2 millions trees to a .tre file with write.tree. This never 
ends, thus i tried to get why and realized that the time to write the .tre 
depends exponentially on the number of trees.

trees<-allTrees(9,rooted=T)to<-seq(1,200,1)durations<-NAfor(i in 1:length(to)) { 
ptm <- proc.time() write.tree(trees[1:to[i]], file = "test.tre") 
durations[i]<-(proc.time() - ptm)[3]}
I do not really get why it is not nearly linear. Any explanation?
Cheers, Jérémie

Jérémie Bardin, Dr.

CR2P - Centre de Recherche en Paléontologie - ParisSorbonne Université - MNHN - 
CNRS
Site Jussieu, Tour 46-56, 5°et.
4 place Jussieu, 75252 Paris Cedex 05
tel. +331.44.27.51.77.
jeremie.bar...@upmc.fr / jbar...@mnhn.fr / jeremiebar...@yahoo.fr

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : 
http://f.security-mail.net/fPLqvOj4




___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Logarithmic Scales for Plotting Dated Phyogenies (e.g. Log of Time Axis?)

2019-01-08 Thread Emmanuel Paradis


Hi David,

Have you tried ape::plotBreakLongEdges? I think that would be the 
simplest solution to your problem. I'm not clear how a non-linear 
transformation could be implemented easily (and be meaningful in most 
situations).


Le 02/01/2019 à 21:15, David Bapst a écrit :

Hi all,

I've been dealing with a tree with one very deep divergence and many
very shallow divergences recently, and I was curious if there was an R
plotting capability that allows for the depth axis of the tree to be
non-linear or logarithmic - helpful if there can be a time axis bar as
well, as with axisPhylo(). Logging the axis directly with par seems to
break plot.phylo, presumably because its trying to plot something at a
negative coordinate.

It seems like a simple thing, but oddly I haven't come upon anything
yet that can do this. Any thoughts?

Cheers,
-Dave

PS: Tangential to that, is there a ladderize function that also takes
into account the edge length on non-ultrametric trees? I just noticed
that ladderize doesn't do much for a tree with a large polytomy
consisting of branches of very different length.


None that I know, but that could be useful indeed.


PPS: Happy New Years, all! I just checked and its now been nine years
I've been following this listserv...


Happy New Years to you and to all! In one month, it will be 11 years 
that the list started.


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] ape chronos function returning pLL of -1E100

2018-12-17 Thread Emmanuel Paradis


Hi John,

It looks like that the constraints on the dates with the 4 calibrations 
are such that the penalized likelihood cannot be calculated. There are 
several occasions when a value of -1e100 is returned (non finite value, 
negative branch lenghts, ...) so it's hard to see what is the problem 
here. Maybe it's better we discuss about this off-list.


Best,

Emmanuel

Le 13/12/2018 à 04:05, John S Denton a écrit :

As an update to my question--when I try to calibrate the tree itself
using the default single root calibration and default 1/1 min/max ages,
the function appears to run fine. When I add in my calibration table,
the function exits with the -Inf LL.


The tree is rooted. There are no duplicated nodes with conflicting
ages, and R does not give errors about the formatting of the calibration
table.


~John


From: R-sig-phylo  on behalf of John S Denton 

Sent: Wednesday, December 12, 2018 1:20:20 PM
To: r-sig-phylo@r-project.org
Subject: [R-sig-phylo] ape chronos function returning pLL of -1E100

I'm trying to use chronos (ape version 5.2) to scale a tree of ~800 tips, with 
branch lengths derived from a RAxML analysis. I'm using 4 outgroup 
calibrations. When I run the analysis using


t.og <- chronos(tree, lambda = 1, model = "relaxed", quiet = FALSE,
 calibration = dd.out)


I get the following returned, in almost 5 seconds:


Setting initial dates...
Fitting in progress... get a first set of estimates
  Penalised log-lik = -1e+100
Optimising rates... dates... -1e+100

Done.


a look at attributes (attr(t.og, "message")) gives a "relative convergence 
(4)." The pLL, however, is -Inf. I have tried setting chronos.control to different values, 
with no success. It looks like it may be an issue with the optimization function inside chronos.


Are there additional constraints/code I can apply to make it optimize without 
(apparently) simply running to the boundary?


Thanks,


~John


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Extended Majority Consensus Topology (Allcompat Summary) in R? And some observations on ape's consensus() function

2018-10-27 Thread Emmanuel Paradis


Hi David,

phangorn has the function consensusNet that builds consensus networks 
(objects of class "networx") which is more appropriate than 
ape::consensus is p < 0.5. There is a nice plotting function using RGL 
for "networx" objects. Of course, you won't get a fully bifurcating tree 
if there are incompatible splits in the tree sample. phangorn has also 
the function lento() to assess how many incompatible splits there are so 
you can test whether a specific consensus tree is likely to have 
reticulations.


I can remind the other tools available in ape for analyzing splits and 
that could help you sort out the splits:


prop.part(TR) returns the list of all splits observed in the list of 
trees TR and their (absolute) frequencies.


bitsplits is an alternative to prop.part which can be faster depending 
on the situation (this one works only with unrooted trees).


prop.clades(tr, TR) returns the frequencies of the splits present in the 
tree tr observed in TR (e.g., to get support values if TR is a list of 
bootstrap trees). It has an option whether the trees should be 
considered rooted or not.


countBipartitions(tr, TR) is similar to the previous but working with 
the "bitsplits" class.


Cheers,

Emmanuel

Le 28/10/2018 à 00:37, David Bapst a écrit :

Hi all,

I was interested if anyone was familiar with R code that can estimate an
extended majority consensus tree (referred to as an 'allcompat' tree by the
sumt command in MrBayes)? This is a fully bifurcating summary of a tree
posterior, where each clade is maximally resolved by the split that is most
abundant in the considered post-burn-in posterior (i.e., that split which
has the plurality, if not the majority - the highest posterior probability
of any other competing, conflicting splits recovered within the posterior.
So, I guess one could also call these plurality consensus trees...).

The ape function `consensus` seemed promising at first, as it takes a `p`
argument which at 1 returns the strict consensus (the default), and at 0.5
returns the majority rule consensus (effectively the same as the
'halfcompat' option in MrBayes). So, I thought, I wonder what happens if I
set `p` below 0.5 - you could imagine that the extended majority consensus
is basically a similar threshold algorithm, but accepting solutions
(splits) of any frequency of occurrence in the tree set, so effectively
p~0.

Unfortunately, that is not how that works out, as `consensus` simply
assembles all splits with a frequency above the `p` value, but doesn't
discard conflicting splits. This means you can theoretically get more
resolved consensus trees below 0.5, but in practical terms your ability to
recover reasonable tree objects lasts until the frequency drops to the
point that you begin to accept conflicting splits.

Here's some code based off a phangorn example where I can do consensus to
get a more resolved tree as I delve into lower `p` values - you can see I
get a reasonable (if you think the extended majority rule ), more resolved
tree at `p = 0.4`, but at `p = 0.2` there are conflicting splits accepted,
such that the tree output no longer has a rational tree structure.

```

library(ape)
library(phangorn)
data(Laurasiatherian)
set.seed(42)
bs <- bootstrap.phyDat(Laurasiatherian, FUN =

function(x)upgma(dist.hamming(x)),
+ bs=100)


tA <- consensus(bs,p=1)
tB <- consensus(bs, p=0.5)
tC <- consensus(bs, p=0.45)
tD <- consensus(bs, p=0.2)

layout(matrix(1:4,2,2))
plot(tA);plot(tB);plot(tC);plot(tD)

Error in plot.phylo(tD) :
   tree badly conformed; cannot plot. Check the edge matrix.

str(tD)

List of 3
  $ edge : int [1:109, 1:2] 48 49 50 51 52 53 54 55 56 57 ...
  $ tip.label: chr [1:47] "GraySeal" "Vole" "Wallaroo" "Loris" ...
  $ Nnode: int 63
  - attr(*, "class")= chr "phylo"
Warning messages:
1: display list redraw incomplete
2: display list redraw incomplete
3: display list redraw incomplete
4: display list redraw incomplete
```
I'd be interested to know if anyone knows of an alternative way to do this
in R, or if perhaps I need to figure out how to modify `consensus` to
reject conflicting splits.

Cheers,
-Dave B

PS: Yes, I know there are real issues with such exhaustive consensus trees,
particularly they will likely agglomerate a combination of splits that
exists on no tree recovered within the posterior, but I have my reasons!



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] No PCOA Correction Applied but Corrected Eigenvalues Returned? (ape::pcoa)

2018-10-25 Thread Emmanuel Paradis

From the code, a correction is made if some eigenvalues are negative 
even if correction="none":


lambda = (lambda - min(lambda))/(tr(Delta) - (n - 1)*min(lambda))

lambda: vector of eigenvalues
Delta: the centered distance matrix

The help page refers to eq. 9.27 in Legendre & Legendre (1998). I'll 
have a look it later and clarify the help page if needed.


Best,

Emmanuel

Le 25/10/2018 à 04:07, David Bapst a écrit :
Emmanuel, I'm familiar with the two typical corrections. I'm just 
curious why or what `pcoa` is reporting as ` "Rel_corr_eig"` when no 
correction is defined by the user.

-Dave

On Tue, Oct 23, 2018 at 10:26 PM Emmanuel Paradis 
mailto:emmanuel.para...@ird.fr>> wrote:


Hi David,

Have you tried one of the two possible corrections?

R> res <- pcoa(dmat)
R> res.lingoes <- pcoa(dmat, correction = "lingoes")
R> res.lingoes$note
[1] "Lingoes correction applied to negative eigenvalues: D' =
-0.5*D^2 -
0.310850908498892 , except diagonal elements"

The matrix $values of the output is different and contains the
corrected
eigenvalues in the second analysis:

R> names(res$values)
[1] "Eigenvalues"    "Relative_eig"   "Rel_corr_eig"   "Broken_stick"
[5] "Cum_corr_eig"   "Cumul_br_stick"
R> names(res.lingoes$values)
[1] "Eigenvalues"  "Corr_eig"     "Rel_corr_eig" "Broken_stick"
[5] "Cum_corr_eig" "Cum_br_stick"
R> any(res.lingoes$values$Corr_eig < 0)
[1] FALSE

Best,

Emmanuel

Le 23/10/2018 à 22:11, David Bapst a écrit :
 > Hi all,
 >
 > Not exactly phylogenetic, but I've recently uncovered some odd
behavior
 > with `pcoa` in `ape` and I was curious if anyone understood what was
 > going on.
 >
 > Usually, if you don't have negative eigenvalues and aren't using a
 > correction, `pcoa` will return the raw eigenvalues. However, I've
found
 > if there are negative eigenvalues and no correction is used, even
though
 > no correction is applied, the function will still return some
corrected
 > relative eigenvalues, and cumulative corrected eigenvalues. What
 > correction was used in this case is not clear.
 >
 > Here's some output using an example dataset from paleotree to
 > demonstrate the weirdness, both without correction argument
specified
 > and with it specified as "none".
 >
 > ```
 >  > library(ape)
 >  > library(paleotree)
 >  >
 >  > data(graptDisparity)
 >  > dmat<-graptDistMat
 >  > res<-pcoa(dmat)
 >  > str(res)
 > List of 5
 >   $ correction: chr [1:2] "none" "1"
 >   $ note  : chr "No correction was applied to the negative
eigenvalues"
 >   $ values    :'data.frame': 183 obs. of  6 variables:
 >    ..$ Eigenvalues   : num [1:183] 3.22 2.72 1.9 1.74 1.41 ...
 >    ..$ Relative_eig  : num [1:183] 0.217 0.183 0.128 0.117 0.095 ...
 >    ..$ Rel_corr_eig  : num [1:183] 0.0495 0.0424 0.031 0.0287
0.0241 ...
 >    ..$ Broken_stick  : num [1:183] 0.0319 0.0264 0.0236 0.0218
0.0204 ...
 >    ..$ Cum_corr_eig  : num [1:183] 0.0495 0.0919 0.1229 0.1517
0.1758 ...
 >    ..$ Cumul_br_stick: num [1:183] 0.0319 0.0583 0.082 0.1038
0.1242 ...
 >   $ vectors   : num [1:183, 1:72] 0.0143 -0.1818 -0.1011 0.104
0.101 ...
 >    ..- attr(*, "dimnames")=List of 2
 >    .. ..$ : chr [1:183] "'Bulmanograptus' macilentus" "'Monograptus'
 > arciformis" "'Monograptus' austerus" "'Paramplexograptus'
kiliani" ...
 >    .. ..$ : chr [1:72] "Axis.1" "Axis.2" "Axis.3" "Axis.4" ...
 >   $ trace : num 14.8
 >   - attr(*, "class")= chr "pcoa"
 >  >
 >  > data(graptDisparity)
 >  > dmat<-graptDistMat
 >  > res<-pcoa(dmat,correction="none")
 >  > str(res)
 > List of 5
 >   $ correction: chr [1:2] "none" "1"
 >   $ note  : chr "No correction was applied to the negative
eigenvalues"
 >   $ values    :'data.frame': 183 obs. of  6 variables:
 >    ..$ Eigenvalues   : num [1:183] 3.22 2.72 1.9 1.74 1.41 ...
 >    ..$ Relative_eig  : num [1:183] 0.217 0.183 0.128 0.117 0.095 ...
 >    ..$ Rel_corr_eig  : num [1:183] 0.0495 0.0424 0.031 0.0287
0.0241 ...
 >    ..$ Broken_stick  : num [1:183] 0.0319 0.0264 0.0236 0.0218
0.0204

Re: [R-sig-phylo] No PCOA Correction Applied but Corrected Eigenvalues Returned? (ape::pcoa)

2018-10-23 Thread Emmanuel Paradis


Hi David,

Have you tried one of the two possible corrections?

R> res <- pcoa(dmat)
R> res.lingoes <- pcoa(dmat, correction = "lingoes")
R> res.lingoes$note
[1] "Lingoes correction applied to negative eigenvalues: D' = -0.5*D^2 - 
0.310850908498892 , except diagonal elements"


The matrix $values of the output is different and contains the corrected 
eigenvalues in the second analysis:


R> names(res$values)
[1] "Eigenvalues""Relative_eig"   "Rel_corr_eig"   "Broken_stick"
[5] "Cum_corr_eig"   "Cumul_br_stick"
R> names(res.lingoes$values)
[1] "Eigenvalues"  "Corr_eig" "Rel_corr_eig" "Broken_stick"
[5] "Cum_corr_eig" "Cum_br_stick"
R> any(res.lingoes$values$Corr_eig < 0)
[1] FALSE

Best,

Emmanuel

Le 23/10/2018 à 22:11, David Bapst a écrit :

Hi all,

Not exactly phylogenetic, but I've recently uncovered some odd behavior 
with `pcoa` in `ape` and I was curious if anyone understood what was 
going on.


Usually, if you don't have negative eigenvalues and aren't using a 
correction, `pcoa` will return the raw eigenvalues. However, I've found 
if there are negative eigenvalues and no correction is used, even though 
no correction is applied, the function will still return some corrected 
relative eigenvalues, and cumulative corrected eigenvalues. What 
correction was used in this case is not clear.


Here's some output using an example dataset from paleotree to 
demonstrate the weirdness, both without correction argument specified 
and with it specified as "none".


```
 > library(ape)
 > library(paleotree)
 >
 > data(graptDisparity)
 > dmat<-graptDistMat
 > res<-pcoa(dmat)
 > str(res)
List of 5
  $ correction: chr [1:2] "none" "1"
  $ note  : chr "No correction was applied to the negative eigenvalues"
  $ values    :'data.frame': 183 obs. of  6 variables:
   ..$ Eigenvalues   : num [1:183] 3.22 2.72 1.9 1.74 1.41 ...
   ..$ Relative_eig  : num [1:183] 0.217 0.183 0.128 0.117 0.095 ...
   ..$ Rel_corr_eig  : num [1:183] 0.0495 0.0424 0.031 0.0287 0.0241 ...
   ..$ Broken_stick  : num [1:183] 0.0319 0.0264 0.0236 0.0218 0.0204 ...
   ..$ Cum_corr_eig  : num [1:183] 0.0495 0.0919 0.1229 0.1517 0.1758 ...
   ..$ Cumul_br_stick: num [1:183] 0.0319 0.0583 0.082 0.1038 0.1242 ...
  $ vectors   : num [1:183, 1:72] 0.0143 -0.1818 -0.1011 0.104 0.101 ...
   ..- attr(*, "dimnames")=List of 2
   .. ..$ : chr [1:183] "'Bulmanograptus' macilentus" "'Monograptus' 
arciformis" "'Monograptus' austerus" "'Paramplexograptus' kiliani" ...

   .. ..$ : chr [1:72] "Axis.1" "Axis.2" "Axis.3" "Axis.4" ...
  $ trace : num 14.8
  - attr(*, "class")= chr "pcoa"
 >
 > data(graptDisparity)
 > dmat<-graptDistMat
 > res<-pcoa(dmat,correction="none")
 > str(res)
List of 5
  $ correction: chr [1:2] "none" "1"
  $ note  : chr "No correction was applied to the negative eigenvalues"
  $ values    :'data.frame': 183 obs. of  6 variables:
   ..$ Eigenvalues   : num [1:183] 3.22 2.72 1.9 1.74 1.41 ...
   ..$ Relative_eig  : num [1:183] 0.217 0.183 0.128 0.117 0.095 ...
   ..$ Rel_corr_eig  : num [1:183] 0.0495 0.0424 0.031 0.0287 0.0241 ...
   ..$ Broken_stick  : num [1:183] 0.0319 0.0264 0.0236 0.0218 0.0204 ...
   ..$ Cum_corr_eig  : num [1:183] 0.0495 0.0919 0.1229 0.1517 0.1758 ...
   ..$ Cumul_br_stick: num [1:183] 0.0319 0.0583 0.082 0.1038 0.1242 ...
  $ vectors   : num [1:183, 1:72] 0.0143 -0.1818 -0.1011 0.104 0.101 ...
   ..- attr(*, "dimnames")=List of 2
   .. ..$ : chr [1:183] "'Bulmanograptus' macilentus" "'Monograptus' 
arciformis" "'Monograptus' austerus" "'Paramplexograptus' kiliani" ...

   .. ..$ : chr [1:72] "Axis.1" "Axis.2" "Axis.3" "Axis.4" ...
  $ trace : num 14.8
  - attr(*, "class")= chr "pcoa"
```

Obviously if there are negative eigenvalues, a correction should 
probably be applied (although perhaps this is a bit of a philosophical 
matter), but still its unclear why the corrected eigenvalues are 
calculated, or how they are calculated. They are clearly different from 
the uncorrected values.


Cheers,
-Dave Bapst


--
David W. Bapst, PhD
Asst Research Professor, Geology & Geophysics, Texas A & M University
Postdoc, Ecology & Evolutionary Biology, Univ of Tenn Knoxville
https://github.com/dwbapst/paleotree

Pour nous remonter une erreur de filtrage, veuillez vous rendre ici 






___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Building phlyogenies from DNA alignments

2018-09-25 Thread Emmanuel Paradis


Hi Zhong,

A few good places where to start:

http://ape-package.ird.fr/
https://github.com/KlausVigo/phangorn

For basic introductions, see the vignettes in the package phangorn:

https://CRAN.R-project.org/package=phangorn

Best,

Emmanuel

Le 25/09/2018 à 17:34, Zhong Huang a écrit :

Hello,


I'm attempting to use R to measure the branch lengths of phylogenies 
constructed from DNA sequence alignments. I'm using Seaview to construct the 
trees, but I'm not sure how to import those trees into R (particularly because 
the trees can only be saved as PDFs or SVGs). I'm hoping that there's a method 
to construct phylogenetic trees from DNA alignments using R.


Apologies if this question has already been addressed. I was unable to find it 
in the archive.


Best,


Zhong Huang

Duke University, Class of 2019

LinkedIn

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : 
http://f.security-mail.net/55vrtKmAvt




___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] plot heatmap at nodes

2018-08-30 Thread Emmanuel Paradis

Another possibility (maybe more flexible) is to get the coordinates of 
the nodes and then call rect(). Something like:


tr <- rcoal(10)
plot(tr, "p", FALSE)
lastPP <- get("last_plot.phylo", envir = .PlotPhyloEnv)
x <- lastPP$xx[-(1:lastPP$Ntip)]
y <- lastPP$yy[-(1:lastPP$Ntip)]
s <- 0.3
par(xpd = TRUE)
for (h in s * c(-1.5, -.5, .5))
  for (v in s * c(-1.5, -.5, .5))
rect(x + h, y + v, x + h + s, y + v + s, col=sample(colors(), 9))

with 's' the size of each cell of the heatmap, and the vectors 'h' and 
'v' are calculated for a 3x3 matrix, so this can be generalized to 
different dimensions (even non-square). The options of rect() can be 
used (border, ...)


par(xpd = TRUE) is to make the heatmap at the root completely visible.

Best,

Emmanuel

Le 30/08/2018 à 12:45, Emmanuel Paradis a écrit :

Hi Jacob,

It is possible to call nodelabels() several times to do something 
similar to what you want by playing with the adj argument which is 
c(0.5, 0.5) by default (ie, the label is centered on the node). You can 
try this:


tr <- rcoal(5)
plot(tr, "p", FALSE)
for (h in c(0.4, 0.5, 0.6))
   for (v in c(0.4, 0.5, 0.6))
     nodelabels(pch=15, col=sample(colors(), size=1), cex=2, adj=c(h, v))

The difficulty is to find the 1st and 3rd values of h (horizontal) and v 
(vertical) and this will depend on the size of the device (and the value 
of cex of course). Maybe there's a way to automate these calculations.


Best,

Emmanuel

Le 30/08/2018 à 00:12, Jacob Berv a écrit :

Dear R-sig-phylo,

Does there exist a function to plot heatmaps at internal nodes of a 
tree? For example, to illustrate support values from different 
phylogenetic analyses — ie given a set of 3x3 matricies of support 
values for particular nodes.


It seems like there may be a partial (and definitely workable) 
solution here, using ggtree— 
https://rgriff23.github.io/2017/05/11/primate-phylogeny-ggtree.html 
<https://rgriff23.github.io/2017/05/11/primate-phylogeny-ggtree.html>, 
if such heatmaps are saved as raster images (about 1/2 way down). This 
may be the easiest solution, but it would be preferable to keep 
everything in vector format and all in R.


I guess I could do this by calling image() for the individual heatmaps 
and then just linking them up to nodes of interest in illustrator (but 
that feels like cheating).


Jake



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at 
http://www.mail-archive.com/r-sig-phylo@r-project.org/



Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : 
http://f.security-mail.net/408vx1yEbQu





___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at 
http://www.mail-archive.com/r-sig-phylo@r-project.org/








___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] plot heatmap at nodes

2018-08-30 Thread Emmanuel Paradis


Hi Jacob,

It is possible to call nodelabels() several times to do something 
similar to what you want by playing with the adj argument which is 
c(0.5, 0.5) by default (ie, the label is centered on the node). You can 
try this:


tr <- rcoal(5)
plot(tr, "p", FALSE)
for (h in c(0.4, 0.5, 0.6))
  for (v in c(0.4, 0.5, 0.6))
nodelabels(pch=15, col=sample(colors(), size=1), cex=2, adj=c(h, v))

The difficulty is to find the 1st and 3rd values of h (horizontal) and v 
(vertical) and this will depend on the size of the device (and the value 
of cex of course). Maybe there's a way to automate these calculations.


Best,

Emmanuel

Le 30/08/2018 à 00:12, Jacob Berv a écrit :

Dear R-sig-phylo,

Does there exist a function to plot heatmaps at internal nodes of a tree? For 
example, to illustrate support values from different phylogenetic analyses — ie 
given a set of 3x3 matricies of support values for particular nodes.

It seems like there may be a partial (and definitely workable) solution here, using 
ggtree— https://rgriff23.github.io/2017/05/11/primate-phylogeny-ggtree.html 
, if such 
heatmaps are saved as raster images (about 1/2 way down). This may be the easiest 
solution, but it would be preferable to keep everything in vector format and all in R.

I guess I could do this by calling image() for the individual heatmaps and then 
just linking them up to nodes of interest in illustrator (but that feels like 
cheating).

Jake



[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : 
http://f.security-mail.net/408vx1yEbQu




___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] understanding variance-covariance matrix

2018-08-25 Thread Emmanuel Paradis


Hi Agus & Brian,

To complete Brian's response, vcv() is a generic function which works 
with trees (class "phylo") and phylogenetic correlation structures 
(class "corPhyl"). The latter can be used to define an OU model, see 
?vcv, and ?corPhyl for the correlation structures available in ape.


Best,

Emmanuel

Le 25/08/2018 à 20:33, Brian O'Meara a écrit :

Hi, Agus. The variance-covariance matrix comes from the tree and the
evolutionary model, not the data. Each entry between taxa A and B in the
VCV is how much covariance I should expect between data for taxa A and B
simulated up that tree using that model. I don't want to be *that guy*, but
O'Meara et al. (2006)
https://onlinelibrary.wiley.com/doi/10./j.0014-3820.2006.tb01171.x has
a fairly accessible explanation of this (largely b/c I was just learning
about VCVs when working on that paper). Hansen and Martins (1996)
https://onlinelibrary.wiley.com/doi/10./j.1558-5646.1996.tb03914.x have
a much more detailed description of how you get these covariance matrices
from microevolutionary processes.

Typically, ape::vcv() is how you get a variance covariance for a phylogeny,
assuming Brownian motion and no measurement error. It just basically takes
the history two taxa share to create the covariance (or variance, if the
two taxa are the same taxon). A different approach, which seems to be what
you're doing, would be to simulate up a tree many times, and then for each
pair of taxa (including the pair of a taxon with itself, the diagonal of
the VCV), calculate the covariance. These approaches should get the same
results, though the shared history on the tree approach is faster.

Best,
Brian


___
Brian O'Meara, http://www.brianomeara.info, especially Calendar
, CV
, and Feedback


Associate Professor, Dept. of Ecology & Evolutionary Biology, UT Knoxville
Associate Head, Dept. of Ecology & Evolutionary Biology, UT Knoxville



On Sat, Aug 25, 2018 at 1:16 PM Agus Camacho  wrote:


Dear list users,

I am trying to make an easy R demonstration to teach the
variance-covariance matrix to students. However, After consulting the
internet and books, I found myself facing three difficulties to understand
the math and code behind this important matrix. As this list is answered by
several authors of books of phylocomp methods, thought this might make an
useful general discussion.

Here we go,

1) I dont know how to generate a phyloVCV matrix in R (Liams kindly
described some options here
<
http://blog.phytools.org/2013/12/three-different-ways-to-calculate-among.html



but I cannot tell for sure what is X made of. It would seem a dataframe of
some variables measured across species. But then, I get errors when I
write:

  tree <- pbtree(n = 10, scale = 1)
  tree$tip.label <- sprintf("sp%s",seq(1:n))
  x <- fastBM(tree)
y <- fastBM(tree)
   X=data.frame(x,y)
  rownames(X)=tree$tip.label
  ## Revell (2009)
  A<-matrix(1,nrow(X),1)%*%apply(X,2,fastAnc,tree=tree)[1,]
  V1<-t(X-A)%*%solve(vcv(tree))%*%(X-A)/(nrow(X)-1)
## Butler et al. (2000)
Z<-solve(t(chol(vcv(tree%*%(X-A)
  V2<-t(Z)%*%Z/(nrow(X)-1)

## pics
Y<-apply(X,2,pic,phy=tree)
  V3<-t(Y)%*%Y/nrow(Y)

2) The phyloVCV matrix has n x n coordinates defined by the n species, and
it represents covariances among observations made across the n species,
right?. Still, I do no know whether these covariances are calculated over
a) X vs Y values for each pair of species coordinates in the matrix, across
the n species, or b) directly over the vector of n residuals of Y, after
correlating Y vs X, across all pairs of species coordinates. I think it may
be a) because, by definition, variance cannot be calculated for a single
value. I am not sure though, since it seems the whole point of PGLS is to
control phylosignal within the residuals of a regression procedure, prior
to actually making it.

3) If I create two perfeclty correlated variables with independent
observations and calculate a covariance or correlation matrix for them, I
do not get a diagonal matrix, with zeros at the off diagonals (ex. here
<
https://www.dropbox.com/s/y8g3tkzk509pz58/vcvexamplewithrandomvariables.xlsx?dl=0

),

why expect then a diagonal matrix for the case of independence among the
observations?

Thanks in advance and sorry if I missed anything obvious here!
Agus
Dr. Agustín Camacho Guerrero. Universidade de São Paulo.
http://www.agustincamacho.com
Laboratório de Comportamento e Fisiologia Evolutiva, Departamento de
Fisiologia,
Instituto de Biociências, USP.Rua do Matão, trav. 14, nº 321, Cidade
Universitária,
São Paulo - SP, CEP: 05508-090, Brasil.

 [[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org

Re: [R-sig-phylo] Aligning tree tip labels together in ape

2018-08-25 Thread Emmanuel Paradis


Hi Nick,

This can be fixed by editing the code of tiplabels() with the usual 
procedure:


fix(tiplabels)

then find this line of code:

tmp <- rect2polar(XX, YY)

and insert below the following one:

if (lastPP$align.tip.label) tmp$r[] <- max(tmp$r)

save and close. Your commands work for me after this fix. You should 
then be able to add series of labels with increasing values of offset:


tiplabels( offset = 6)
tiplabels( offset = 8)
tiplabels( offset = 10)

Best,

Emmanuel

Le 24/08/2018 à 17:04, Carleson, Nick a écrit :

Hello all,

I'm using the R package 'ape' to make non-ultrametric phylogenetic trees 
. I want to align tip labels together when drawing a tree with 
plot.phylo(), and also add on more labels using the tiplabels() 
function. But, I can't get additional tip labels from tiplabels aligned 
properly with those from plot.phylo. When I call tiplabels, the labels 
are being drawn at the true edge length instead of neatly around the 
plot with the tip labels drawn by plot.phylo. So, in the plot below, I 
want the tip labels (circles) to be aligned adjacent to the tip labels 
(names) drawn byplot.phylo.


Good: tips like t30, t31, and t2
Bad: tips like t1, t20, and t21




Can anyone help me out?

Below is the code to produce the tree above. I'm using ape version 5.1 
and R version 3.4.3.


library(ape)
set.seed(31)
sim_tree <- rlineage(0.1, 0.05)
plot.phylo(sim_tree, type = "fan", align.tip.label = TRUE)
tiplabels(pch = 19, col = rainbow(3), offset = 6)

Thanks all!

**Nick Carleson*
*

**

PhD Student*| Oregon State University
*Graduate Research Assistant | *Botany and Plant Pathology**
*


Pour nous remonter une erreur de filtrage, veuillez vous rendre ici 






___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] ape-package: unknown states in ace analyses?

2018-06-22 Thread Emmanuel Paradis


Two more comments:

1/ The first insertion in the code of ace() below is not needed if the 
user replaces "?" (or any coding for unknown/uncertain state) with NA -- 
which seems reasonable to do.


2/ This approach of taking uncertainty into account in likelihood 
ancestral state reconstruction is also used for DNA sequences and 
implemented in phangorn where uncertainty is (implicitly) taken from the 
IUPAC ambiguity code. Because there are here four states, there is more 
than one possibility of ambiguity, but the same logic applies: if the 
tip "may be" in state x, then set column x to one in the matrix of 
likelihoods:


1 0 0 0 # if the base observed is A
0 0 1 0 # if the base observed is G
1 0 1 0 # if the base observed is R
...
1 1 1 1 # if the base observed is N

This can be applied to any character with more than two states.

Best,

Emmanuel

Le 22/06/2018 à 11:41, Emmanuel Paradis a écrit :

Hi Théo,

It is possible to modify the code of ace() to take uncertain character 
states into account. If you edit the code with fix(ace), after these 
four lines:


    if (method != "ML")
    stop("only ML estimation is possible for discrete characters.")
    if (any(phy$edge.length <= 0))
    stop("some branches have length zero or negative")

you insert:

     if (any(x == "?")) x[x == "?"] <- NA

Then around 30 lines below, after this line:

    liks[cbind(TIPS, x)] <- 1

you insert:

    if (anyNA(x)) liks[which(is.na(x)), ] <- 1

Save and close. I tested this with a simple example:

tr <- compute.brlen(stree(8, "b"), 1)
x <- rep(0:1, each = 4)
plot(tr)
tiplabels(x)

So there are two clades in state 0 and state 1, respectively. With no 
uncertainty, the node likelihoods are well unbalanced except at the root:


R> ace(x, tr, "d")$lik.anc
    0   1
[1,] 0.5 0.5
[2,] 0.953204905 0.046795095
[3,] 0.995642789 0.004357211
[4,] 0.995642789 0.004357211
[5,] 0.046795095 0.953204905
[6,] 0.004357211 0.995642789
[7,] 0.004357211 0.995642789

Let's say the first value is unknown:

x[1] <- "?"

Now the node likelihoods are slightly "in favour" of state 1 because of 
the possibility of this state within the "state 0" clade:


R> ace(x, tr, "d")$lik.anc
    0   1
[1,] 0.479307333 0.520692667
[2,] 0.904647485 0.095352515
[3,] 0.943101903 0.056898097
[4,] 0.990270355 0.009729645
[5,] 0.053105869 0.946894131
[6,] 0.005881435 0.994118565
[7,] 0.005881435 0.994118565

For comparison, here's what is returned by the current version in ape:

R> ace(x, tr, "d")$lik.anc
    ?   0  1
[1,] 0.093607676 0.404434328 0.50195800
[2,] 0.096246928 0.787370632 0.11638244
[3,] 0.201389158 0.750401358 0.04820948
[4,] 0.011648214 0.974955230 0.01339656
[5,] 0.017176009 0.050038984 0.93278501
[6,] 0.003422534 0.006275985 0.99030148
[7,] 0.003422534 0.006275985 0.99030148

You may try this modification with your data and give feed-back: if it 
useful, I'll fix that in ape.


Cheers,

Emmanuel

Le 21/06/2018 à 23:32, Théo Léger a écrit :

Hello,


I am working on the phylogeny of a subfamily of moths and I use ace 
from the ape-package to reconstruct the ancestral state of a bunch of 
morphological characters.


I encountered a problem with the few unknown states I have on my 
matrix (coded "?", either because material for examination was missing 
or the state could not fit in any of the categories): they are treated 
as other characters. Is there a way to ignore them? Is there a way to 
estimate the state for the species with missing information? I know it 
is possible to estimate the state at a tip with missing information in 
functions like AncTresh (using bayesian reconstruction) from the 
phytools package, but I am not sure if ML reconstruction can do it.


Thanks in advance to those who can enlighten me!


Cheers,

Théo Léger


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at 
http://www.mail-archive.com/r-sig-phylo@r-project.org/








___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] ape-package: unknown states in ace analyses?

2018-06-22 Thread Emmanuel Paradis


Hi Théo,

It is possible to modify the code of ace() to take uncertain character 
states into account. If you edit the code with fix(ace), after these 
four lines:


   if (method != "ML")
   stop("only ML estimation is possible for discrete characters.")
   if (any(phy$edge.length <= 0))
   stop("some branches have length zero or negative")

you insert:

if (any(x == "?")) x[x == "?"] <- NA

Then around 30 lines below, after this line:

   liks[cbind(TIPS, x)] <- 1

you insert:

   if (anyNA(x)) liks[which(is.na(x)), ] <- 1

Save and close. I tested this with a simple example:

tr <- compute.brlen(stree(8, "b"), 1)
x <- rep(0:1, each = 4)
plot(tr)
tiplabels(x)

So there are two clades in state 0 and state 1, respectively. With no 
uncertainty, the node likelihoods are well unbalanced except at the root:


R> ace(x, tr, "d")$lik.anc
   0   1
[1,] 0.5 0.5
[2,] 0.953204905 0.046795095
[3,] 0.995642789 0.004357211
[4,] 0.995642789 0.004357211
[5,] 0.046795095 0.953204905
[6,] 0.004357211 0.995642789
[7,] 0.004357211 0.995642789

Let's say the first value is unknown:

x[1] <- "?"

Now the node likelihoods are slightly "in favour" of state 1 because of 
the possibility of this state within the "state 0" clade:


R> ace(x, tr, "d")$lik.anc
   0   1
[1,] 0.479307333 0.520692667
[2,] 0.904647485 0.095352515
[3,] 0.943101903 0.056898097
[4,] 0.990270355 0.009729645
[5,] 0.053105869 0.946894131
[6,] 0.005881435 0.994118565
[7,] 0.005881435 0.994118565

For comparison, here's what is returned by the current version in ape:

R> ace(x, tr, "d")$lik.anc
   ?   0  1
[1,] 0.093607676 0.404434328 0.50195800
[2,] 0.096246928 0.787370632 0.11638244
[3,] 0.201389158 0.750401358 0.04820948
[4,] 0.011648214 0.974955230 0.01339656
[5,] 0.017176009 0.050038984 0.93278501
[6,] 0.003422534 0.006275985 0.99030148
[7,] 0.003422534 0.006275985 0.99030148

You may try this modification with your data and give feed-back: if it 
useful, I'll fix that in ape.


Cheers,

Emmanuel

Le 21/06/2018 à 23:32, Théo Léger a écrit :

Hello,


I am working on the phylogeny of a subfamily of moths and I use ace from the 
ape-package to reconstruct the ancestral state of a bunch of morphological 
characters.

I encountered a problem with the few unknown states I have on my matrix (coded 
"?", either because material for examination was missing or the state could not 
fit in any of the categories): they are treated as other characters. Is there a way to 
ignore them? Is there a way to estimate the state for the species with missing 
information? I know it is possible to estimate the state at a tip with missing 
information in functions like AncTresh (using bayesian reconstruction) from the phytools 
package, but I am not sure if ML reconstruction can do it.

Thanks in advance to those who can enlighten me!


Cheers,

Théo Léger


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Testing tree disagreement: SH-test disappeared in ape-package

2018-06-08 Thread Emmanuel Paradis


Hi Sebastian,

All the likelihood-based phylogenetic inference functions have been 
removed from ape since version 2.4 (released in Oct 2009) because of the 
(incomparably better) implementation in phangorn:


library(phangorn)
?SH.test

Best,

Emmanuel

Le 08/06/2018 à 14:50, Sebastian Y. Müller a écrit :

Dear all,

I'd was wondering how to best perform a test to determine test 
disagreement (congruence test) in R?
Additional information can be found in Planet et al (DOI: 
10.1016/j.jbi.2005.08.008).


A common test seems the Shimodaira-Hasegawa Test (SH-test) which used to 
be implemented in APE:

https://www.rdocumentation.org/packages/ape/versions/2.3-3/topics/sh.test

However, it seem to have disappeared in recent versions (e.g. ape 5.0).
Did this happen for a reason? Or maybe it has been replaced by a more 
appropriate test?


Failing this, is anyone aware of other software to perform SH-, KH- or 
SOWH-tests?


Thanks for any help!



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] "Not of class phylo" error when lapply is used

2018-04-09 Thread Emmanuel Paradis


Hi John,

See my comments below.

Le 09/04/2018 à 16:46, jschenk a écrit :

Hi Folks,

I have been banging my head against what appears to be an easy coding problem 
for a while now and haven’t been able to hack my way out of it.  I am running a 
function to identify a posterior set of node ages for a particular node.  The 
function I wrote works just fine, but when I use lapply to sample across a 
posterior distribution, I get the “not of class ‘phylo’” error, even when I 
input a single tree of class phylo.  I have tried every typical solution (e.g., 
reassigning class), but haven’t identified a solution.  Does anyone know a 
workaround?

Please see the example code below.

Thanks,

John




library(ape)
#simulate a single tree with 20 tips
simtree <- rtree(20)
#Make sure the tree exists
plot(simtree)

#Function I wrote to find a node and then tell me the age of the node.  I 
realize that the simulated tree is not ultrametric in this example, in real 
life it will be - ultrametric trees also result in the same error
AgeDensity <- function(phy, species1, species2){
NodeNumber <- mrca(phy)[species1, species2]


You may use:

NodeNumber <- getMRCA(phy, c(species1, species2))

this will be much more efficient for large trees.


ages <- branching.times(phy)[as.character(NodeNumber)]


Beware that branching.times() is not meaningful for non-ultrametric 
trees (as those generated by rtree). See this for instance:


R> tr <- rtree(2)
R> tr$edge.length
[1] 0.2098386 0.0353521
R> branching.times(tr)
3
0.0353521
R> branching.times(rotate(tr, 3))
3
0.2098386

If you really want to work with non-ultrametric trees, maybe you need to 
use dist.nodes instead.



return(as.numeric(ages))
}

#check the class of the tree object, it will say that it is of class phylo
class(simtree)

#Run my function AgeDensity and it works just fine
AgeDensity(simtree, "t3", "t15")

#When I use the lapply function, I get an error that the object is not a of 
class phylo, although I already verified that it is of class phylo.
lapply(simtree, AgeDensity, species1="t3", species2="t15")


You can build a list of trees simply:

simtree <- c(simtree)

where you can put several trees (as many as you want):

simtree <- c(rtree(20), rtree(20))

and add trees on an existing list:

simtree[[4]] <- rtree(20)
simtree[5:10] <- rmtree(6, 20)


#here is the same analysis conducted with multiple trees
multiTrees <- rmtree(20, 10)


Check-out the options of rmtree: what you did is similar to rmtree(N=20, 
n=10) with N: number of trees, and n: number of tips.



class(multiTrees)
#I get the same error when I run my function across multiple trees
lapply(multiTrees, AgeDensity, species1="t3", species2="t15")


Indeed, but the error message is certainly different (and not only 
because of the language set-up):


R> lapply(multiTrees, AgeDensity, species1="t3", species2="t15")
Error in mrca(phy)[species1, species2] : indice hors limites

Maybe you need to add a check in your AgeDensity function:

if (! species1 %in% phy$tip.label) stop("species1 not in the tree")

and the same for species2.

HTH

Best,

Emmanuel


__
John J. Schenk, Ph.D.
Assistant Professor of Plant Biology
Georgia Southern University Herbarium (GAS), Curator
Department of Biology
4324 Old Register Road
Georgia Southern University
Statesboro, GA 30460-8042
Office:  2260 Biology Building
Office phone:  (912) 478-0848
Lab website: sites.google.com/a/georgiasouthern.edu/schenk
Herbarium website: sites.google.com/a/georgiasouthern.edu/gasherbarium








[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : 
http://f.security-mail.net/4074J0etF3M




___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Suggestion for update of as.igraph.phylo

2018-02-19 Thread Emmanuel Paradis


Dear Watal,

Thank you. This is now in ape.

I've just uploaded a new testing version (5.0-5) with the last changes 
and fixes on ape's site:


http://ape-package.ird.fr/ape_installation.html#versions

A Windows version is available. The list of changes are listed there:

http://ape-package.ird.fr/NEWS

Best,

Emmanuel

Le 14/02/2018 à 08:30, Watal M. Iwasaki a écrit :

Dear developer team,

Thank you for providing an excellent package. But I have encountered a
small problem with as.igraph.phylo().

Error in graph.edgelist(x$edge, directed = directed, ...) :
   could not find function "graph.edgelist"
Calls: as.igraph.phylo

The workaround for a user is to do `library(igraph)` explicitly because it
exports `graph.edgelist()`. But I think the ape package should call this
function with explicit namespace like `igraph::graph.edgelist()`. Also it
would be better to add igraph to Suggests section in DESCRIPTION so that
`utils::globalVariables("graph.edgelist")` can be removed. Also note that
`graph.edgelist()` is deprecated (it still works, though) and replaced by
`graph_from_edgelist()` in the current version of igraph.

In summary, I suggest the following modification:
- replace line 143 in as.phylo.R with `igraph::graph_from_edgelist(x$edge,
directed = directed)`
- remove line 131 in as.phylo.R: `utils::globalVariables("graph.edgelist")`
- add igraph to Suggests in DESCRIPTION

Best regards,
Watal



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Maximum likelihood time-scaling

2018-02-08 Thread Emmanuel Paradis


Hi Santiago,

This approach is implemented in phangorn, from ?optim.pml:

 If the option 'optRooted' is set to TRUE than the edge lengths of
 rooted tree are optimized. The tree has to be rooted and by now
 ultrametric! Optimising rooted trees is generally much slower.

Cheers,

Emmanuel

Le 08/02/2018 à 00:12, Santiago Claramunt a écrit :

Does anyone know a maximum-likelihood implementation that uses molecular 
sequences to optimizes node ages instead of branch lengths (ala Felsenstein 
2008, p. 266)?
It sounds like nice idea for using in time calibrations.

Cheers,

Santiago



Santiago Claramunt

Associate Curator
Department of Natural History
Royal Ontario Museum
100 Queen’s Park Crescent
Toronto, ON, M5S 2C6
CANADA

(416) 586-5520
sclaram...@rom.on.ca

&

Assistant Professor
Department of Ecology and Evolutionary Biology, University of Toronto












ON NOW
VIKINGS: The Exhibition, presented by Raymond James 
Ltd.
Christian Dior
Wildlife Photographer of the Year
Here We Are Here: Contemporary African Canadian 
Art

COMING SOON
2 June | Iris van Herpen: Transforming Fashion
16 June | Spiders

À L’AFFICHE
VIKINGS : L'exposition, une présentation de Raymond James 
Ltée
Christian Dior
Le Photographe naturaliste de 
l'année
Nous Sommes Ici, D'ici : L'art Contemporain des Noirs 
Canadiens

À VENIR
2 juin | Iris van Herpen : Transformer la mode
16 juin | Les araignŽes

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Specifying a polytomy at the root with no outgroup

2018-01-20 Thread Emmanuel Paradis


Hi,

To get a tree with a basal multichotomy as a rooted tree, you have to 
set its root edge:


R> tr <- read.tree(text = "(A,B,(C,D));")
R> is.rooted(tr)
[1] FALSE
R> tr$root.edge <- 0
R> is.rooted(tr)
[1] TRUE

This could be done directly in the Newick string:

R> tr <- read.tree(text = "(A,B,(C,D)):0;")
R> is.rooted(tr)
[1] TRUE

Best,

Emmanuel

Le 19/01/2018 à 23:08, Yan Wong a écrit :

Hi

I have a set of rooted trees, some of which have polytomies at the root, like

(A,B,(C,D));
((A,B),(C,D));

These are not treated as rooted by default, so I have tried to root them using

root(tree, node = Ntip(phy) + 1, resolve.root = TRUE), but this gives an error, 
explained here:

https://www.mail-archive.com/r-sig-phylo@r-project.org/msg03805.html

Note that in my case there is no particular reason to specify either A, B, or 
(C,D) as an outgroup

I should explain that what I am trying to do is to calculate distance metrics 
between these trees (using metrics that are valid with polytomies, such as the 
KC metric), such that the pairwise distances between (A,B,(C,D)) and 
(B,A,(C,D)) and ((C,D), B, A) etc are all 0, but the distance between, say, 
(A,B,(C,D)) and ((A,B),(C,D)) or perhaps (A,C,(B,D)); is not zero.

So how can I specify a polytomy at the root, with the tree format used in R?

Cheers

Yan Wong
Oxford
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : 
http://f.security-mail.net/304anPlLmqF




___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] color edges of internal nodes with a single child

2017-12-23 Thread Emmanuel Paradis


This is fixed in ape 5.0-3 just uploaded one ape's site:

http://ape-package.ird.fr/ape_installation.html#versions

If you don't want to (or cannot) re-install ape, you can get the file 
plot.phylo.R from the source package and source() it into R -- not just 
the function phylogramm.plot: because of the namespace it is better to 
source plot.phylo() at the same time.


Best,

Emmanuel

Le 23/12/2017 à 22:48, Emmanuel Paradis a écrit :

Hi Walter,

That shouldn't be too hard to fix. This happens, apparenly, only if 
type="phylogram" -- which is the default. This seems to work correctly 
if type="c" (and if there are branch lengths).


I'll update you when it's fixed.

Best,

Emmanuel

Le 20/12/2017 à 22:17, Walter, Mathias a écrit :

Hi!

I tried using the edge.color parameter with a vector of colors of
plot.phylo on a tree having several internal single nodes (which have 
just
one child node). Unfortunately, it reports an error that A/B needs a 
value.

I tracked it down to the function phylogramm.plot which expects nodes
having at least two children.

Is there an easy solution for this, except using collapse.single?

--
Kind regards,
Mathias

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at 
http://www.mail-archive.com/r-sig-phylo@r-project.org/








___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] color edges of internal nodes with a single child

2017-12-23 Thread Emmanuel Paradis


Hi Walter,

That shouldn't be too hard to fix. This happens, apparenly, only if 
type="phylogram" -- which is the default. This seems to work correctly 
if type="c" (and if there are branch lengths).


I'll update you when it's fixed.

Best,

Emmanuel

Le 20/12/2017 à 22:17, Walter, Mathias a écrit :

Hi!

I tried using the edge.color parameter with a vector of colors of
plot.phylo on a tree having several internal single nodes (which have just
one child node). Unfortunately, it reports an error that A/B needs a value.
I tracked it down to the function phylogramm.plot which expects nodes
having at least two children.

Is there an easy solution for this, except using collapse.single?

--
Kind regards,
Mathias

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/







___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] ape

2017-12-07 Thread Emmanuel Paradis


Hi,

You can find all released versions of ape as source packages on CRAN:

https://cran.r-project.org/src/contrib/Archive/ape/

Best,

Emmanuel

Le 07/12/2017 à 11:19, pascal de Clarens a écrit :

Hello,

I am using Iramutec working togethr with R
it needs to use ape 3.5 annd having installed and updated the libraries, ,
the one currently installed (mac Sierre 10.13) is the version 5

I therefore need to delete the lates versin and install the 'old' 3.3 ape
version

I need help as I do not know what code lines I have to use in the R
terminal.

Thanks in advance for your help

bets regards



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] paste() hangs R

2017-11-02 Thread Emmanuel Paradis


Hi Andreas,

This sounds more like an issue with RStudio which seems to freeze from 
time to time, unless you have a reproducible example under another 
environment.


Best,

Emmanuel

Le 30/10/2017 à 15:38, Andreas Kolter a écrit :
If I paste() a huge DNAbin alignment (by accident) (as in: 250MB object 
size with 15 sequences) R Studio hangs and can not be closed or 
aborted (Esc key). I have to Crtl+Strg+Del force quit the process.


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at 
http://www.mail-archive.com/r-sig-phylo@r-project.org/



Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : 
http://f.security-mail.net/301SaTdYVUM





___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Removing columns containing "N" in DNA alignment

2017-10-27 Thread Emmanuel Paradis


Hi Vojtěch,

Here's something you could do. First, make a copy of del.colgapsonly:

toto <- del.colgapsonly

Then, edit this copy (e.g., with fix(toto)), find this line:

foo <- function(x) sum(x == 4)

and replace 4 by 240. Save and close. Now you can use toto() in the same 
way than del.colgapsonly(); for instance, to get the number of N's in 
each site of the woodmouse data:


R> toto(woodmouse, freq.only = TRUE)
  [1]  3  2  2  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1


If you wonder where the values 4 and 240 come from:

R> as.integer(as.DNAbin("-"))
[1] 4
R> as.integer(as.DNAbin("N"))
[1] 240

This gives a lot of possibilities to hack the function. For instance, if 
you want to find the sites with R/Y, first, get the integer codes:


R> as.integer(as.DNAbin("R"))
[1] 192
R> as.integer(as.DNAbin("Y"))
[1] 48

Then, change the above line for:

foo <- function(x) sum(x == 192 | x == 48)

HTH

Best,

Emmanuel

Le 27/10/2017 à 17:02, Vojtěch Zeisek a écrit :

Hm, I tried a dirty hack: I exported the DNAbin object using ape::write.dna
and replaced all occurrences of "n" in any sequence by "-" and imported the
file back to R with ape::read.dna. Then I tried the mentioned functions. They
did nothing. When I exported the file to disk, the FASTA file did not contain
any "-", only "n". DO I do something wrong, or is there a bug in ape as it
seems to confuse "n" and "-"?
Sincerely,
V.

Dne pátek 27. října 2017 16:25:02 CEST jste napsal(a):

Hello,
I checked ape::del.colgapsonly, ips::deleteGaps and ips::deleteEmptyCells.
They delete columns containing missing values, but I need also to delete
columns containing base "N" (all columns with amount of Ns over certain
threshold).
Actually, ips::deleteEmptyCells has option nset=c("-", "n", "?"), so it is
suppose to remove columns/rows containing only the given characters, but if
I use it and export data (ape::write.dna or ape::write.nexus.data), some
samples consist only of N characters...
The DNAbin object being processed was originally imported from VCF using
vcfR (read.vcfR(file="my.vcf") and converted: vcfR2DNAbin(x=myvcf,
consensus=TRUE, extract.haps=FALSE, unphased_as_NA=FALSE)).
I checked source code of the above functions, but they seem to only count
NAs and then drop respective columns. And as sequences in DNAbin are stored
in binary format, I'm bit struggled here... :(
Any idea how to remove columns with given portion of "N" in sequences?
Sincerely,
V.


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Why does ace set a random number generator seed?

2017-10-20 Thread Emmanuel Paradis


Hi Dave,

The seed is created every time you generate random data (see details in 
?.Random.seed):


R> exists(".Random.seed")
[1] FALSE
R> sample(1)
[1] 1
R> exists(".Random.seed")
[1] TRUE

So in your code below the seed is created by calling rtree(). Indeed, 
with ape 4.1:


R> rm(".Random.seed")
R> data(bird.orders)
R> x <- 1:23
R> exists(".Random.seed")
[1] FALSE
R> o <- ace(x, bird.orders)
R> exists(".Random.seed")
[1] FALSE

However, with the new "soon-to-be-on-CRAN" version, a seed is created by 
the same code. The "culprit" is actually Rcpp: the new ape has C++ code 
linked to Rcpp. For a reason I ignore, it seems that a random seed is 
initialized when calling a function linked to Rcpp. Here is an example:


R> rm(".Random.seed")
R> library(RcppEigen)
R> exists(".Random.seed")
[1] FALSE
R> o <- fastLm(1, 1)
R> exists(".Random.seed")
[1] TRUE

The new C++ code in ape is called by reorder(phy, "postorder") which is 
used by many functions in ape (ace, pic, plot.phylo, vcv, ...).


Best,

Emmanuel

Le 19/10/2017 à 21:00, David Bapst a écrit :

Emmanuel, all-

I noticed today that a workspace I was working with had a random
number seed set in it, but didn't remember setting one. Finally, I
discovered the culprit was ace. Here's a reproducible example,
demonstrating that a seed exists after running ace:

library(ape)
tree<-rtree(10)
ace(1:10,tree)
.Random.seed

I am using ape 4.1, and it doesn't seem to be addressed in the
forthcoming version, given my reading of the changes log (but I might
have missed it). What's going? Why is a random number seed being set?

Cheers,
-Dave



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] installing a testing version of ape

2017-10-17 Thread Emmanuel Paradis

Maybe your network has a cache or a firewall (or something similar 
blocking you). I just tried from here and it worked:


R> download.packages("ape", ".", repos = "http://ape-package.ird.fr/;)
trying URL 'http://ape-package.ird.fr/src/contrib/ape_4.9.tar.gz'
Content type 'application/x-gzip' length 755146 bytes (737 KB)
==
downloaded 737 KB

 [,1]  [,2]
[1,] "ape" "./ape_4.9.tar.gz"

The Win build is also available. The NEWS file is here:

http://ape-package.ird.fr/NEWS

Best,

Emmanuel

Le 17/10/2017 à 15:52, Elizabeth Purdom a écrit :

Thanks Emmanuel!

I do not yet see it yet (I still get the same response in R when following 
those commands). I also don’t see anything in NEWS on the front page since 
2017-05-03 so maybe I’m having a bigger problem, but I will check again later 
today to see if it changes.

All of the best,
Elizabeth



On Oct 17, 2017, at 2:47 PM, Emmanuel Paradis <emmanuel.para...@ird.fr> wrote:

Hi Elizabeth,

I have submitted ape 5.0 on CRAN on 5 October but I'm still waiting for 
feedback. I suspect there are problems with reverse dependencies. In the 
meantime, I cleared the testing repositories thinking that CRAN would take the 
usual 1-2 days to publish the new version. Sorry for the confusion.

I have just uploaded a version 4.9 on the testing repos which is virtually 
identical to the version being examined by CRAN. I'll upload a Windows build in 
a few hours. The NEWS file on ape-package.ird.fr has been updated.

Best,

Emmanuel

Le 17/10/2017 à 12:11, Elizabeth Purdom a écrit :

Hello,
I currently have version 4.1 of ape installed on my computer (Mac OS X, R 
3.4.2) which appears to currently be the most up-to-date on CRAN. However, I am 
trying to get the testing version of ape, because I know that there have been 
changes made that I would like to take advantage of without waiting for them to 
show up on CRAN. I have tried using the commands on the website, and they do 
not show any new versions available. Specifically, using `old.packages`, 
nothing is returned:

old.packages(repos = "http://ape-package.ird.fr/;)

NULL
And trying to download anyway gets me an error:

download.packages("ape", ".", repos = "http://ape-package.ird.fr/;)

trying URL 'http://ape-package.ird.fr/src/contrib/ape_0.0.1.tar.gz'
Error in download.file(url, destfile, method, mode = "wb", ...) :
   cannot open URL 'http://ape-package.ird.fr/src/contrib/ape_0.0.1.tar.gz'
In addition: Warning message:
In download.file(url, destfile, method, mode = "wb", ...) :
   cannot open URL 'http://ape-package.ird.fr/src/contrib/ape_0.0.1.tar.gz': 
HTTP status was '404 Not Found'
Warning in download.packages("ape", ".", repos = "http://ape-package.ird.fr/;) :
   download of package ‘ape’ failed
  [,1] [,2]
This is presumably because http://ape-package.ird.fr/src/contrib/PACKAGES 
points to 0.0.1. I don’t immediately see an option to specify a version of the 
package (and indeed I don’t know what version I would put in). Do I need to be 
running a development version of R to get the newest versions of ape?
Thanks for any help,
Elizabeth Purdom
PS Here is my sessionInfo:

sessionInfo()

R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6
Matrix products: default
BLAS: 
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: 
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base
other attached packages:
[1] BiocInstaller_1.26.1
loaded via a namespace (and not attached):
[1] compiler_3.4.2 tools_3.4.2
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/









___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] installing a testing version of ape

2017-10-17 Thread Emmanuel Paradis


Hi Elizabeth,

I have submitted ape 5.0 on CRAN on 5 October but I'm still waiting for 
feedback. I suspect there are problems with reverse dependencies. In the 
meantime, I cleared the testing repositories thinking that CRAN would 
take the usual 1-2 days to publish the new version. Sorry for the confusion.


I have just uploaded a version 4.9 on the testing repos which is 
virtually identical to the version being examined by CRAN. I'll upload a 
Windows build in a few hours. The NEWS file on ape-package.ird.fr has 
been updated.


Best,

Emmanuel

Le 17/10/2017 à 12:11, Elizabeth Purdom a écrit :

Hello,

I currently have version 4.1 of ape installed on my computer (Mac OS X, R 
3.4.2) which appears to currently be the most up-to-date on CRAN. However, I am 
trying to get the testing version of ape, because I know that there have been 
changes made that I would like to take advantage of without waiting for them to 
show up on CRAN. I have tried using the commands on the website, and they do 
not show any new versions available. Specifically, using `old.packages`, 
nothing is returned:


old.packages(repos = "http://ape-package.ird.fr/;)

NULL

And trying to download anyway gets me an error:


download.packages("ape", ".", repos = "http://ape-package.ird.fr/;)

trying URL 'http://ape-package.ird.fr/src/contrib/ape_0.0.1.tar.gz'
Error in download.file(url, destfile, method, mode = "wb", ...) :
   cannot open URL 'http://ape-package.ird.fr/src/contrib/ape_0.0.1.tar.gz'
In addition: Warning message:
In download.file(url, destfile, method, mode = "wb", ...) :
   cannot open URL 'http://ape-package.ird.fr/src/contrib/ape_0.0.1.tar.gz': 
HTTP status was '404 Not Found'
Warning in download.packages("ape", ".", repos = "http://ape-package.ird.fr/;) :
   download of package ‘ape’ failed
  [,1] [,2]

This is presumably because http://ape-package.ird.fr/src/contrib/PACKAGES 
points to 0.0.1. I don’t immediately see an option to specify a version of the 
package (and indeed I don’t know what version I would put in). Do I need to be 
running a development version of R to get the newest versions of ape?

Thanks for any help,
Elizabeth Purdom

PS Here is my sessionInfo:


sessionInfo()

R version 3.4.2 (2017-09-28)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: OS X El Capitan 10.11.6

Matrix products: default
BLAS: 
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
LAPACK: 
/Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] BiocInstaller_1.26.1

loaded via a namespace (and not attached):
[1] compiler_3.4.2 tools_3.4.2

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Can I get coordinates from plot.phylo and not get blank page

2017-09-26 Thread Emmanuel Paradis


Hi Elizabeth,

You can use directly the functions behind the calculations of the 
coordinates; they are documented together: ?node.depth. But this will 
not give the final coordinates since more calculations are needed 
depending on the widths of the labels, etc.


Another possibility could be to plot the tree in a temporary file and 
get the settings afterwards, eg:


R> tmpf <- tempfile()
R> pdf(tmpf)
R> plot(rtree(100))
R> dev.off()
X11cairo
   2
R> lastPP <- get("last_plot.phylo", envir = .PlotPhyloEnv)
R> lastPP
$type
[1] "phylogram"


R> unlink(tmpf) # optional since it's deleted when R is closed

Just for the record, if some users wonder about the usefulness of the 
option plot=FALSE, see some examples in:


example(phydataplot)

HTH

Best,

Emmanuel

Le 26/09/2017 à 14:58, Elizabeth Purdom a écrit :

Hello,

I am writing a function that calls plot.phylo in the ape package with the 
option plot=FALSE. I then do some calculations with the output (the calculated 
coordinates), and then make another call to plot.phylo with plot=TRUE. If I do 
this, then the plot=FALSE option creates a blank plot, which is the result that 
is clearly documented in the help pages.

However, I do not want this blank plot, because I am saving the result (the 
second call to plot.phylo) to a pdf, and so the pdf has a blank page before the 
plot I want. And since I’m calling this in a function, I can’t “wait” for the 
second plot (a similar problem occurs if you run it in a knitr document)

Is there a way to get the output from plot.phylo with all of the coordinates, 
etc. without having to have the blank plot? (I do not want an option to “add” 
the plot to the blank plot, if it exists, because I am doing these calculations 
so that in my next call I can change the x.lim options so the tree only takes 
up a fraction of the plot, so I need to reset the par, etc., and not just draw 
on the original coordinates set up by the blank plot)

Thank you,
Elizabeth Purdom

Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : 
http://f.security-mail.net/302lu3ZCAL7




___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Possible Bug in ape::read.tree

2017-09-08 Thread Emmanuel Paradis


Hi George,

Nice comparison! It would be interesting to see if the difference 
between ape and rncl is always in the same direction. It seems that the 
trees in your test are all relatively small (timings are concentrated 
around 1 ms). Apparently, read.tree scales a bit better than 
read_newick_phylo with bigger trees:


R> write.tree(rtree(5), file = "tr.tre")
R> system.time(a <- read.tree("tr.tre"))
utilisateur système  écoulé
  0.240   0.004   0.245
R> system.time(b <- read_newick_phylo("tr.tre"))
utilisateur système  écoulé
  0.380   0.020   0.397
R> identical(a, b)
[1] FALSE
R> all.equal(a, b)
[1] TRUE

Best,

Emmanuel

Le 07/09/2017 à 21:39, George Vega Yon a écrit :

Hi Klaus,

Since I'm actually working with the complete PANTHER 11.1 database, I wrote
a short document in our project's website comparing both ape and
rncl reading ~13,000 trees. ape::read.tree does the job and identifies and
reads the singleton as expected. Overall, while I see that the new version
of the read.tree function is significantly faster, it does not seem to be
significantly faster than rncl (both perform pretty good).

You can find the document here:
https://github.com/USCbiostats/aphylo/blob/master/playground/ape_now_supports_singletons.md

Best,



George G. Vega Yon
+1 (626) 381 8171
http://cana.usc.edu/vegayon

On Tue, Sep 5, 2017 at 5:55 PM, Klaus Schliep 
wrote:


Dear George & list,
you can try install the development version of ape, which can handle
singleton nodes:
download.packages("ape", ".", repos = "http://ape-package.ird.fr/;)
and than install the package from source. See
http://ape-package.ird.fr/ape_installation.html for more details.
For you tree the new version seems to import tree fine.
Emmanuel and I are keen to get feedback on the new read.tree() function so
please give it a try.
Kind regards,
Klaus




On Tue, Sep 5, 2017 at 5:37 PM, Liam J. Revell 
wrote:


Hi George. Mario is correct that phytools can read a tree with singleton
nodes (if that is indeed your problem), but the name of the function is
read.newick. Good luck! - Liam

Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org


On 9/5/2017 4:17 PM, George Vega Yon wrote:


Hi Mario,

That sounds great!

Thanks,

George G. Vega Yon
+1 (626) 381 8171
http://cana.usc.edu/vegayon

On Tue, Sep 5, 2017 at 2:08 PM, Mario José Marques-Azevedo <
mariojm...@gmail.com> wrote:

Hi, George,


Ape package do not read tree with singletons, yet. New version will
read.
You can use readTree from phytools package for while.

Best regards,

Mario


On 5 Sep 2017 5:53 p.m., "George Vega Yon"  wrote:

Hi,

When trying to read this tree with the ape::read.tree function :

((AN14:0.000,AN15:0.000):0.000[&:Ev=0>1:S=
Homo-Pan:ID=AN13],(AN17:0.000,AN18:0.000):0.000[&:Ev=0>
1:S=Murinae:ID=AN16]):0.000[&:Ev=0>1:S=Euarchontoglires:
ID=AN12],(AN20:0.000,AN21:0.000):0.000[&:Ev=0>1:S=
Laurasiatheria:ID=AN19]):0.000[&:Ev=0>1:S=Eutheria:
ID=AN11],AN22:0.004):0.003[&:Ev=0>1:S=Mammalia:ID=AN10]
,AN23:0.000):0.014[&:Ev=0>1:S=Amniota:ID=AN9],AN24:0.
017):0.021[&:Ev=0>1:S=Tetrapoda:ID=AN8],((AN27:0.
066,AN28:0.052):0.010[&:Ev=1>0:ID=AN26],(AN30:0.031,
AN31:0.014):0.003[&:Ev=1>0:ID=AN29]):0.021[&:Ev=0>
1:S=Teleostei:ID=AN25]):0.147[&:Ev=0>1:S=Osteichthyes:
ID=AN7],AN32:0.194):0.104[&:Ev=0>1:S=Deuterostomia:ID=
AN6],((AN35:0.020,AN36:0.038):0.416[&:Ev=0>1:S=
Caenorhabditis:ID=AN34],((AN39:0.119,AN40:0.131):0.154[&
:Ev=0>1:S=Insecta:ID=AN38],AN41:0.317):0.104[&:Ev=0>
1:S=Arthropoda:ID=AN37]):0.118[&:Ev=0>1:S=Ecdysozoa:
ID=AN33],AN42:0.381):0.594[&:Ev=0>1:S=Bilateria:ID=AN5]
,AN43:2.000):0.414[&:Ev=0>1
  :S=Eumetazoa:ID=AN4],AN52:0.484,AN53:0.427):0.418[&
:Ev=0>1:S=
Saccharomycetaceae:ID=AN51],AN54:0.733):0.320[&:Ev=0>
1:S=Saccharomycetaceae-Candida:ID=AN50],AN55:0.808):0.290[&&
NHX:Ev=0>1:S=
Saccharomycetales:ID=AN49],(AN57:0.793,((AN60:0.206,AN61:
0.186):0.143[&:Ev=0>1:S=Sordariomyceta:ID=AN59],AN62:
0.331):0.332[&:Ev=0>1:S=Sordariomycetes-Leotiomycetes:
ID=AN58]):0.467[&:Ev=0>1:S=Pezizomycotina:ID=AN56]):0.
432[&:Ev=0>1:S=Pezizomycotina-Saccharomycotina:ID=AN48],
AN63:0.864):0.223[&:Ev=0>1:S=Ascomycota:ID=AN47],(AN65:
0.613,AN66:0.930,AN67:0.755):0.332[&:Ev=0>1:S=
Basidiomycota:ID=AN64]):0.618[&:Ev=0>1:S=Dikarya:ID=
AN46],((AN74:0.337,AN75:0.406):0.298[&:Ev=0>1:S=
Saccharomycetaceae:ID=AN73],AN76:0.578):0.338[&:Ev=0>
1:S=Saccharomycetaceae-Candida:ID=AN72],AN77:0.567):0.304[&&
NHX:Ev=0>1:S=
Saccharomycetales:ID=AN71],(AN79:0.379,((AN82:0.398,AN83:
0.403):0.191[&:Ev=0>1:S=Sordariomyceta:ID=AN81],AN84:
0.366):0.140[&:Ev=0>1:S=Sordariomyc
  etes-Leotiomycetes:ID=AN80]):0.457[&:Ev=0>1:S=
Pezizomycotina:ID=AN78]):0.331[&:Ev=0>1:S=Pezizomycotina-
Saccharomycotina:ID=AN70],AN85:0.737):0.128[&:Ev=0>

Re: [R-sig-phylo] phylogenetic signal with sample sizes but no standard errors

2017-08-04 Thread Emmanuel Paradis


There is also an implementation of this method in two functions in ape, see:

?pic.ortho
?varCompPhylip

There are examples with simulated data in both help pages. The second 
function requires PHYLIP (similarly to Rphylip)	.


Best,

Emmanuel

Le 04/08/2017 à 14:36, Joe Felsenstein a écrit :

There is also my C program Contrast, which implements a method from a
2008 paper I wrote:

Felsenstein, J.  2008.  Comparative methods with sampling error and
within-species variation: contrasts revisited and revised. American
Naturalist 171: 713-725.

This estimates the within-species covariances and the between-species
evolutionary variances.It is not an R program but can be accessed
through Liam Revell's package Rphylip, if you also have my (non-R)
package PHYLIP installed.

A pretty good (but not ML) estimate of the within-species phenotypic
variance can be gotten by pooling the within-species sampling error.
The harder part is using that to correct one's estimate of the
covariances of the between-species change, which using ordinary
methods will have some within-species variation mixed in.

Joe

Joe Felsenstein j...@gs.washington.edu
  Department of Genome Sciences and Department of Biology,
  University of Washington, Box 355065, Seattle, WA 98195-5065 USA

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/







___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Frequency histograms in APE

2017-08-03 Thread Emmanuel Paradis


Hi,

This was asked some time ago:

http://www.mail-archive.com/r-sig-phylo@r-project.org/msg04388.html

HTH

Best,

Emmanuel

Le 30/07/2017 à 10:58, LUCAS A. (599841) a écrit :

Hi

(This is my first query to a group like this, so my apologies if the subject is 
a little basic)

I have a data set of insect COI barcodes, that i've aligned and edited in MEGA. 
 I've also generated the K2P distances in MEGA, producing a very large 
spreadsheet of intra and inter specific distances.

I'd like to generate a frequency distribution histogram of conspecific and 
heterospecific distances for selected sub groups of my data.  I understand this 
can be done in the APE package, but am struggling to understand the method from 
the documentation.  Can anyone advise?

Andrew Lucas


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : 
http://f.security-mail.net/302HzMB0t5y




___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

[R-sig-phylo] new version of ape

2017-08-03 Thread Emmanuel Paradis


Dear all,

A new version of ape is in preparation with several significant changes. 
The most important ones are:


- All types of phylogenetic trees and networks are now supported:
* trees with singleton nodes are handled with the usual functions
read.tree, plot.tree, ...
* networks are read/write with the new functions read.evonet or 
write.evonet (supporting the extended Newick format) and are handled 
with several new functions or improvements of previous ones.


- read.tree() and read.nexus() are now based on C code and should be 
several times faster.


- The internal code of prop.part() and of reorder.phylo() have  been 
rewritten in C++ and should be several times faster. This also affects 
dist.topo() which is much faster with the default distance.


- The new function Xplor() shows all data files on the local machine in 
a Web browser with clickable links to the directories and files.


- image.DNAbin() and image.AAbin() have been improved with several new 
options.


All changes are listed in:

http://ape-package.ird.fr/NEWS

A testing version is available as source package and pre-compiled for 
Windows; details on how to install them are in:


http://ape-package.ird.fr/ape_installation.html#versions

At this stage, this testing version seems stable. It will be submitted 
to CRAN after more tests. We also welcome tests from users and developers.


Best,

Emmanuel & Klaus

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Error in eigen(val, only.values = TRUE) : infinite or missing values in 'x' when estimating lambda in pgls

2017-07-03 Thread Emmanuel Paradis


Hi Lydia,

This kind of problem happens sometimes though it's not clear why. One 
possibility is that the Pagel model is not appropriate for your data. 
One thing you can do is to estimate lambda by repeatedly fitting models 
with fixed = TRUE for different values of lambda between 0 and 1, and 
select the value with the largest logLik (or equivalently smallest AIC). 
If you find a value of lambda close to 0 or close to 1, this would 
suggest that the Pagel model may not be ideal.


One point to be careful is that this procedure does not include lambda 
as a free parameter. For instance, with some simulated data:


R> AIC(gls(y ~ x, correlation=corPagel(phy=tr)))
[1] 56.33383

This gives ^lambda=0.1521213 which can be used as fixed in corPagel:

R> AIC(gls(y ~ x, correlation=corPagel(0.1521213, phy=tr, fixed=TRUE)))
[1] 54.33383

But if this value is found as suggested above, it must be considered as 
a free parameter and the last AIC value but increased by 2. This could 
be important if you compare this model with others using AIC.


HTH

Best,

Emmanuel

Le 26/06/2017 à 16:45, Lydia  a écrit :

Hello everyone,

I have problems estimating lambda using the gls function in R.

The gls function works perfectly fine when I am running it with a fixed 
value of lambda (ie. fixed=TRUE) but when I am trying to estimate lambda 
and incorporate this estimation of lambda into my analysis, ie. :


 >result = gls(traitY ~ traitX, data=DF, correlation=corPagel(value=1, 
tree_list[[1]], fixed=FALSE), method="ML")


I always get the following error message :

 >Error in eigen(val, only.values = TRUE) :

 >infinite or missing values in 'x'

My question is : How can this be solved

I would be really thankful for any kind of suggestion/help since I can 
‘t continue my analysis without solving this particular problem (it is 
really important to get the estimted lambda for my statistical 
interpretation) and have been stuck for quite a while and haven’ t found 
anything useful on the net.


PS.

I am using the libraries : nlme and ape

My dataframe is the following :

  traitY traitX
Alouatta_palliata 0.5273489 0.625
Ateles_geoffroyi 0.5051398 0.438
Cebus_capucinus 0.2895715 0.5036667
Saimiri_sciureus 0.3591358 0.659
Cebus_apella 0.6884629 0.5572500
Cercopithecus_diana 0.5714286 0.905
Chlorocebus_pygerythrus 0.7604781 0.7246667
Erythrocebus_patas 0.000 1.000
Macaca_arctoides 0.4155720 0.5895000
Macaca_fuscata 0.6594206 0.556
Macaca_mulatta 0.6614815 0.8275000
Macaca_nigra 0.6416667 0.817
Macaca_radiata 0.7331135 0.691
Macaca_tonkeana 0.4991048 0.8932500
Papio_ursinus 0.7044067 0.657
Homo_sapiens 0.5037506 0.532
Pan_paniscus 0.600 0.900
Pan_troglodytes_troglodytes 0.1771352 0.8856364
Pongo_pygmaeus 0.500 0.600
Eulemur_fulvus_fulvus 0.7662338 0.873
Eulemur_fulvus_rufus 0.7334606 0.5475000

Thanks in advance,

Lydia


Pour nous remonter une erreur de filtrage, veuillez vous rendre ici 






___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : 
http://f.security-mail.net/3029u8bnc3q



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] need finite 'xlim' values error with plot.phylo

2017-06-14 Thread Emmanuel Paradis


Hi Santiago,

For the tree 'rtr' below, the default size of the graphical device is 
indeed too small: you can maximize the window size and this should work 
(or you can set manually x.lim). After plotting the tree this way, you 
can reduce the size of the window and have no problem. It's fixed now 
and will be in the next release of ape.


Best,

Emmanuel

Le 13/06/2017 à 18:34, Santiago Sánchez a écrit :

Hello,

I was wondering if there could be a bug in the plot.phylo code. I'm trying
to plot a tree which is scaled in yr units (less than 1 million years).
However I'm getting this error:

Error in plot.window(...) : need finite 'xlim' values

I tried rescaling to kyr and it plots well. Here is a sample test code:

library(ape)
rtr <- rtree(50)
rtr$edge.length <- rtr$edge.length*10
plot(rtr)
rtr2 <- rtr
rtr2$edge.length <- rtr2$edge.length/1000
plot(rtr2)

Something interesting is that it will plot if I omit tip labels:

plot(rtr, show.tip.label=F)

Cheers,
Santiago



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] A possible alternate MRCA function to APE's getMRCA

2017-06-10 Thread Emmanuel Paradis


Joseph,

Really cool. I updated the code in ape.

About (byte-)compiling, this is usually done during installing the package:

http://ape-package.ird.fr/ape_installation.html#linux

I guess Win and Mac versions from CRAN are byte-compiled. To check if a 
function is compiled, just print it in R, it will show the bytecode at 
the end of the display:


R> rtree




Besides, the NEWS for R 3.4.0 says:

"The JIT (‘Just In Time’) byte-code compiler is now enabled by default 
at its level 3. This means functions will be compiled on first or second 
use and top-level loops will be compiled and then run. (Thanks to Tomas 
Kalibera for extensive work to make this possible.)"


This seems to be effective:

R> foo <- function(x) x
R> foo(1)
[1] 1
R> foo
function(x) x
R> foo(1)
[1] 1
R> foo
function(x) x


Best,

Emmanuel

Le 10/06/2017 à 14:29, Joseph W. Brown a écrit :

Emmanuel and Klaus,

After Klaus reminded me that `which` was so inefficient I realized using 
it in finding the match in the first lineage is unnecessary. Instead, we 
can capitalize on the fact that shallower nodes have smaller nodeids. So 
I threw out the vector index `mrcaind` altogether:


getMRCA_new<- function(phy, tip) {
if (!inherits(phy, "phylo"))
stop('object "phy" is not of class "phylo"')
if (length(tip) < 2) return(NULL)
Ntip <- length(phy$tip.label)
rootnd <- Ntip + 1L

pars<- integer(phy$Nnode) # worst case assignment, usually far too long
tnd <- if (is.character(tip)) match(tip, phy$tip.label) else tip

## build a lookup table to get parents faster
pvec <- integer(Ntip + phy$Nnode)
pvec[phy$edge[, 2]] <- phy$edge[, 1]

## get entire lineage for first tip
nd <- tnd[1]
for (k in 1:phy$Nnode) {
nd <- pvec[nd]
pars[k] <- nd
if (nd == rootnd) break
}
pars<- pars[1:k] # delete the rest
mrcand <- pars[1]; # keep track of actual node rather than index in pars


## traverse lineages for remaining tips, stop if hit common ancestor
for (i in 2:length(tnd)) {
done <- FALSE
cnd <- tnd[i]
while (!done) {
cpar<- pvec[cnd] # get immediate parent
if (cpar %in% pars) {
if (cpar == rootnd) return(rootnd) # early exit
mrcand <- min(mrcand,cpar); # shallowest node has lowest nodeid
done <- TRUE
}
cnd <- cpar # keep going!
}
}
mrcand
}

I also played around with compiling the function:

library(compiler);
getMRCA_em_cmp<- cmpfun(getMRCA_em);
getMRCA_new_cmp<- cmpfun(getMRCA_new);

Here are some timings (*_em is your code, *_new is the code above). For 
my original example (100K tips, 500 tips in MRCA query) there is not a 
whole lot of difference:


microbenchmark(getMRCA_em(phy, tips), getMRCA_new(phy, tips), 
getMRCA_em_cmp(phy, tips), getMRCA_new_cmp(phy, tips));

Unit: milliseconds
expr min lq mean median uq max neval
   getMRCA_em(phy, tips) 
23.9987929.8196433.6743732.3914935.61888136.34521100
  getMRCA_new(phy, tips) 
23.7694128.0250031.8428531.3700134.8964044.66380100
   getMRCA_em_cmp(phy, tips) 
20.8192423.6011828.2395728.2520230.1853559.72102100
  getMRCA_new_cmp(phy, tips) 
20.1658823.1957127.1961026.6578329.8214942.61702100


However, for the worst-case scenario that Klaus devised the results are 
very impressive:


microbenchmark(getMRCA_em(tree, c(1,2)), getMRCA_new(tree, c(1,2)), 
getMRCA_em_cmp(tree, c(1,2)), getMRCA_new_cmp(tree, c(1,2)));

Unit: milliseconds
expr min lq mean median uq max neval
   getMRCA_em(tree, c(1, 2)) 
124.79040140.44473146.98292145.39224150.96044286.78335100
  getMRCA_new(tree, c(1, 2)) 
123.39924139.41609145.21311143.92062148.34205297.73846100
   getMRCA_em_cmp(tree, c(1, 2)) 
23.7210526.8494429.9495029.8793532.2649041.99264100
  getMRCA_new_cmp(tree, c(1, 2)) 
23.1522325.7468629.2152629.2537030.8091569.68522100


So it might be worth compiling.

HTH.
JWB

Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu <mailto:josep...@umich.edu>



On 10 Jun, 2017, at 05:26, Emmanuel Paradis <emmanuel.para...@ird.fr 
<mailto:emmanuel.para...@ird.fr>> wrote:


Joseph, Klaus,

This is great. I slightly edited the code and renamed the argument 
'tips' to 'tip' (as in the original definition). I run on a bunch of 
random trees and the results are exactly identical with the current 
version of getMRCA. I'm changing the code in ape now.


Cheers,

Emmanuel

getMRCA <- function(phy, tip)
{
   if (!inherits(phy, "phylo"))
   stop('object "phy" is not of class "phylo"')
   if (length(tip) < 2) return(NULL)
   Ntip <- length(phy$tip.label)
   rootnd <- Ntip + 1L

   pars <- integer(phy$Nnode) # worst case assignment, usually far too 
long
   mrcaind <- 1L # index in pars of the mrca no

Re: [R-sig-phylo] A possible alternate MRCA function to APE's getMRCA

2017-06-10 Thread Emmanuel Paradis


Joseph, Klaus,

This is great. I slightly edited the code and renamed the argument 
'tips' to 'tip' (as in the original definition). I run on a bunch of 
random trees and the results are exactly identical with the current 
version of getMRCA. I'm changing the code in ape now.


Cheers,

Emmanuel

getMRCA <- function(phy, tip)
{
if (!inherits(phy, "phylo"))
stop('object "phy" is not of class "phylo"')
if (length(tip) < 2) return(NULL)
Ntip <- length(phy$tip.label)
rootnd <- Ntip + 1L

pars <- integer(phy$Nnode) # worst case assignment, usually far too 
long
mrcaind <- 1L # index in pars of the mrca node. will return highest 
one traversed by other lineages

tnd <- if (is.character(tip)) match(tip, phy$tip.label) else tip

## build a lookup table to get parents faster
pvec <- integer(Ntip + phy$Nnode)
pvec[phy$edge[, 2]] <- phy$edge[, 1]

## get entire lineage for first tip
nd <- tnd[1]
for (k in 1:phy$Nnode) {
nd <- pvec[nd]
pars[k] <- nd
if (nd == rootnd) break
}
pars <- pars[1:k] # delete the rest

## traverse lineages for remaining tips, stop if hit common ancestor
for (i in 2:length(tnd)) {
done <- FALSE
cnd <- tnd[i]
while (!done) {
cpar <- pvec[cnd] # get immediate parent
if (cpar %in% pars) {
## early exit! if the mrca of any set of taxa is the 
root then we are done

if (cpar == rootnd) return(rootnd)
idx <- which(pars == cpar)
if (idx > mrcaind)  mrcaind <- idx
done <- TRUE
} else cnd <- cpar # keep going!
}
}
pars[mrcaind]
}


Le 10/06/2017 à 00:28, Klaus Schliep a écrit :

Hi Joseph

I guess I figured out where some room for improvement was.
You grow the vector pars <- c(pars, nd)
I preallocated pars, but you could even just grow pars using pars[k] <- nd
without pre-allocation. This operation is now (with R >= 3.4) far less
costly as mentioned in the NEWS:
"Assigning to an element of a vector beyond the current length now
over-allocates by a small fraction. The new vector is marked internally as
growable, and the true length of the new vector is stored in the truelength
field. This makes building up a vector result by assigning to the next
element beyond the current length more efficient, though pre-allocating is
still preferred. The implementation is subject to change and not intended
to be used in packages at this time."

One of the many little improvements recently making R quite a bit faster
and easier.

So the worst case behavior was pretty bad

tree <- stree(10, "right")
system.time(get_mrca_fast(tree, c(1,2)))
user  system elapsed
  13.424   0.012  13.440
  mrca
[1] 19

with the changes it is much better:

system.time(mrca <- get_mrca_fast(tree, c(1,2)))
user  system elapsed
   0.012   0.000   0.012

mrca

[1] 19

And here is the code:

get_mrca_fast <- function (phy, tips) {
 rootnd <- length(phy$tip.label) + 1;
 pars <- integer(phy$Nnode);  # worst case assignment, usually far too
long
 mrcaind <- 1;  # index in pars of the mrca node. will return
highest one traversed by other lineages
 tnd <- NULL;   # node ids of tips
 if ("character" %in% class(tips)) {
 tnd <- match(tips, phy$tip.label);
 } else {
 tnd <- tips;
 }

 # build a lookup table to get parents faster
 pvec <- integer(max(phy$edge));
 pvec[phy$edge[,2]] <- phy$edge[,1];

 # get entire lineage for first tip
 nd <- tnd[1];
 for(k in 1:phy$Nnode){
 nd <- pvec[nd]
 pars[k] <- nd
 if (nd == rootnd) break
 }
 pars <- pars[1:k] #delete the rest

 # traverse lineages for remaining tips, stop if hit common ancestor
 for (i in 2:length(tnd)) {
 done <- FALSE;
 cnd <- tnd[i];
 while (!done) {
 cpar <- pvec[cnd]; # get immediate parent
 if (cpar %in% pars) {
 if (cpar == rootnd) {
 # early exit! if the mrca of any set of taxa is the
root then we are done
 return(rootnd);
 }
 idx <- which(pars == cpar);
 if (idx > mrcaind) {
 mrcaind <- idx;
 }
 done <- TRUE;
 } else {
 # keep going!
 cnd <- cpar;
 }
 }
 }
 return(pars[mrcaind]);
}


Have a nice weekend,
Klaus

On Fri, Jun 9, 2017 at 3:59 PM, Joseph W. Brown  wrote:


Liam,

I put the (updated) code up as a gist . Feel free to use/improve/whatever.
Emmanuel sees room for improvement but frankly taking 0.3 seconds to find
the MRCA of 5000 taxa on a million-tip tree is good enough for my purposes.

JWB

Re: [R-sig-phylo] Average Aminoacid Identity tree with bootstrap support. Is it possible?

2017-04-07 Thread Emmanuel Paradis


Hi Salvador,

You first need to define the variables that would be resampled during 
the bootstrap procedure. In the case of DNA sequences, this would be the 
columns (= sites) of the alignment, since they each contribute to the 
distance calculation (or to the likelihod function if you use ML). So 
you have to find the equivalent for your genomic data (my intuition 
tells me that won't be straightforward...)


Once you did that, you can arrange these variables into a matrix (or 
data frame) and input it into boot.phylo with the NJ tree.


HTH.

Best,

Emmanuel

Le 06/04/2017 à 16:44, Salvador Espada Hinojosa a écrit :

hello,

I'm PhD student and I have genomic bacterial data and want to produce a
tree from it. I'm currently using around 10 genomes and I calculated their
"Average Amino acid Identity" so I have the distance matrix
(distance=1-AAI). I produced with ape a neighbor joining tree, and wanted
to produce bootstrap support. Since I feed R with the distance matrix, I
cannot use the command boot.phylo, that requires "a taxa (rows) by
characters (columns) matrix" as one of the inputs.

Could you give me any suggestion, please?

Thanks a lot!



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Rooting a tree with a basal polytomy

2017-02-07 Thread Emmanuel Paradis


Hi,

If you have a single tree (object of class "phylo")

phy$root.edge <- 0

If there are several trees in your NEXUS file (then phy is of class 
"multiPhylo"):


for (i in seq_along(phy)) phy[[i]]$root.edge <- 0

Best,

Emmanuel

Le 07/02/2017 à 13:17, Yan Wong a écrit :

Sorry if this is a trivial question, but I have some (rooted) nexus trees with 
polytomies at the root. When I read them in using read.nexus() they are treated 
as unrooted. What’s the easiest way to tell R that they are intended to be 
rooted, even though the root is a polytomy? There’s no ‘force.rooted’ option to 
any of the tree reading routines that I can see.

Cheers

yan
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : 
http://f.security-mail.net/307Q0KTUtCj




___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] HKY GTR distances

2017-02-03 Thread Emmanuel Paradis


Dear Andreas,

There is no distance formula for HKY or GTR model. For GTR, Rodrı́guez 
et al. developed a procedure to calculate a distance (also in Yang's 
2006 book). An example is given below with the woodmouse:


matlog <- function(x) {
tmp <- eigen(X)
V <- tmp$vectors
U <- diag(log(tmp$values))
V %*% U %*% solve(V)
}

tr <- function(x) sum(diag(x))

data(woodmouse)

PI <- diag(base.freq(woodmouse[1:2, ]))
Ft <- Ftab(woodmouse[1:2, ])
F <- Ft/sum(Ft)
X <- solve(PI) %*% F
-tr(PI %*% matlog(X))

You have to call this code for each pair of sequences (or wrap it in a 
function).


For HKY, Yang mentioned a procedure Rzhetsky & Nei (1994, J Mol Evol).

Best,

Emmanuel

Rodrı́guez F., Oliver J. L., Marı́n A. & Medina J. R. 1990. The general 
stochastic model of nucleotide substitution. Journal of Theoretical

Biology 142: 485–501.


Le 01/02/2017 à 23:52, kolte...@rub.de a écrit :

Dear Phylothusiasts,
I need to compare multiple substitution models side-by-side (species
clustering stuff by distances only). Unfortunately, I am not aware of an
implementation of HKY and GTR distance computations using R. Maybe there
is some Github code or something else that I have been missing?
I do not want to build a phylogenetic tree. I can not use PAUP (>500 big
fasta files).
Any ideas are greatly appreciated.
Best wishes,
Andreas

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/







___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] ape - corMartins

2017-01-24 Thread Emmanuel Paradis


Hi Sérgio,

It seems your results make good sense. alpha=0 is too far from the 
"optimum" value (i.e., the value that maximises the log-lik), so the 
optimisation fails to maximise the log-likelihood with respect to this 
parameter. It has been written in the literature that it is difficult to 
estimate reliably alpha in an OU model, and your results seem to confirm 
this.


Besides, alpha=0 is equivalent to a Brownian motion model. So if your 
traits are not really evolving according to this model, the GLS fitting 
procedure may be in difficulty.


You may try:

correlation = corMartins(value=0.9, phy=tree, fixed=FALSE)

If this estimates alpha=0.99, then you may probably rely on this result 
(after checking the estimates of the coefficients of course).


Best,

Emmanuel

Le 20/01/2017 à 15:16, Sergio Ferreira Cardoso a écrit :

Dear all,

I tried to estimate a value for alpha parameter of corMartins
(Ornstein-Uhlenbeck) but the output is a bit puzzling. I do as follows:




fitOU<-gls(fcl~mass+loc_dim+activity+feeding+loc_typ+agility,correlation=corMartins(value=0,phy=tree,fixed=FALSE),data=df,method="ML",weights=varFixed(~vf))

Error in recalc.corStruct(object[[i]], conLin) :

NA/NaN/Inf in foreign function call (arg 4)

Enter a frame number, or 0 to exit

1: gls(fcl ~ mass + loc_dim + activity + feeding + loc_typ + agility,
correlati
2: nlminb(c(coef(glsSt)), function(glsPars) -logLik(glsSt, glsPars),
control =
3: objective(.par, ...)
4: logLik(glsSt, glsPars)
5: logLik.glsStruct(glsSt, glsPars)
6: recalc(object, conLin)
7: recalc.modelStruct(object, conLin)
8: recalc(object[[i]], conLin)
9: recalc.corStruct(object[[i]], conLin)

It work when I turn the "value" to 1, but then the estimate for alpha is
0.99, which looks like it isn't actually estimating a value and is instead
accepting my input as the alpha value. Do you think there is a reason for
this to happen?

Thank you in advance.

Cheers,
Sérgio.



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Enigmatic error with drop.tip

2017-01-23 Thread Emmanuel Paradis


Hi Klaus,

Thanks for tracking this. It's fixed.

Best,

Emmanuel

Le 20/01/2017 à 18:41, Klaus Schliep a écrit :

Hi Juan,
it seems that you trim too many taxa.

I can replicate the error message if I trim for example 19 out of a 20 tips
of a tree.
drop.tip(tree, sample(20, 19))
Error in integer(max(oldnodes)) : vector size cannot be infinite
In addition: Warning message:
In max(oldnodes) : no non-missing arguments to max; returning -Inf

traceback()

3: integer(max(oldnodes))
2: collapse.singles(phy)
1: drop.tip(tree, sample(20, 19))

Maybe you include a check into your function z that the pruned trees have
at least two tips left
if( length( setdiff(treeP$tip.label, colnames(hp))) < ( length(
treeP$tip.label ) -2 ) )

Cheers,
Klaus



On Fri, Jan 20, 2017 at 7:26 AM, Juan Antonio Balbuena 
wrote:


Hello all

I am working with a cophylogenetic problem where I have a host tree, a
parasite tree and a binary matrix HP encapsulating the host-parasite
associations . After trimming the matrix according to a given criterion, I
require to prune the trees accordingly. Since I am getting errors at the
execution I devised the following control function to check the number of
tree nodes after pruning:
z <- function (hp, treeH,treeP) {
  treeh <- drop.tip(treeH, setdiff(treeH$tip.label, rownames(hp)))
  treep <- drop.tip(treeP, setdiff(treeP$tip.label, colnames(hp)))
  res <- c(treeh$Nnode,treep$Nnode)
 return(res)
 }

Then I try the following

x <- sapply (THP, z, treeH=TreeH[[7]], treeP= TreeP[[7]])

where THP is a list of trimmed HP matrices and TreeH and TreeP are
multiphylo objects and get the following error:

Error in integer(max(oldnodes)) : vector size cannot be infinite

The traceback is

6. integer(max(oldnodes))
5. collapse.singles(phy)
4. drop.tip(treeP, setdiff(treeP$tip.label, colnames(hp)))
3. FUN(X[[i]], ...)
2. lapply(X = X, FUN = FUN, ...)
1. sapply(THP, z, treeH = TreeH[[7]], treeP = TreeP[[7]])

This error appears only with this particular pair of host and parasite
trees (#7) and I cannot find a reason why this case should be different.

Any help will be very much appreciated.

Juan A. Balbuena

--

Dr. Juan A. Balbuena
Cavanilles Institute of Biodiversity and Evolutionary Biology
University of Valencia
http://www.uv.es/~balbuena
P.O. Box 22085
http://www.uv.es/cophylpaco

46071 Valencia, Spain
e-mail: j.a.balbu...@uv.estel. +34 963 543 658
<+34%20963%2054%2036%2058>fax +34 963 543 733
<+34%20963%2054%2037%2033>

*NOTE!* For shipments by EXPRESS COURIER use the following street address:
C/ Catedrático José Beltrán 2, 46980 Paterna (Valencia), Spain.


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-
sig-ph...@r-project.org/







___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] multi2di fails on multiPhylo objects

2017-01-16 Thread Emmanuel Paradis


Hi,

Klaus: thanks for the fix.

There is a development version on ape-package.ird.fr:

http://ape-package.ird.fr/ape_installation.html#versions

This version is 4.0-0.2. The source as well as a Windows version are 
avaialble. The list of changes is there:


http://ape-package.ird.fr/NEWS

Best,

Emmanuel

Le 14/01/2017 à 00:43, Yan Wong a écrit :

On 13 Jan 2017, at 20:37, Klaus Schliep  wrote:


Hi Yan,

it seems we introduced a small bug with making these functions generic. Replace 
in the function .multi2di_ape the line
phy <- reorder(phy)
with
phy <- .reorder_ape(phy, "cladewise", FALSE, n, 1L)
and it should work. You have to recompile ape afterwards.


By the way, is this likely to make it into a new release soon, or is there a 
dev version I can access somehow? I need to roll this out across a number of 
different computers, some of which I don’t have admin access to.

Yan
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/







___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Extract all possible clades from a tree

2017-01-06 Thread Emmanuel Paradis


I paste below the new code of extract.clade for those who want to try it.

Best,

Emmanuel


extract.clade <- function(phy, node, root.edge = 0, interactive = FALSE)
{
n <- length(phy$tip.label)
if (interactive) {
cat("Click close to the node...\n")
node <- identify(phy)$nodes
} else {
if (length(node) > 1) {
node <- node[1]
warning("only the first value of 'node' has been considered")
}
if (is.character(node)) {
if (is.null(phy$node.label))
stop("the tree has no node labels")
node <- match(node, phy$node.label) + n
if (is.na(node)) stop("'node' not among the node labels.")
}
if (node <= n)
stop("node number must be greater than the number of tips")
}
if (node == n + 1L) return(phy)
keep <- prop.part(phy)[[node - n]]
drop.tip(phy, (1:n)[-keep], root.edge = root.edge, rooted = TRUE)
}
########

Le 06/01/2017 à 10:17, Emmanuel Paradis a écrit :

Hi,

Klaus is right: extract.clade() fails if the tree has been rooted with
resolve.root = TRUE before. I'm going to rewrite the function calling
drop.tip (which will be safer). In the meantime, Klaus's function should
work for Kamila.

Best,

Emmanuel

Le 06/01/2017 à 05:17, Klaus Schliep a écrit :

I had a look at the tree. There seems a bug in extract.clades, when the
tree was rooted before with the root() function. extract.clade() seem to
expect some ordering of nodes which seems not satisfied (and it looks as
it was written a long time ago).

I just wrote a small function (easier than to debug the old one) which
should do the same and it seems to work:

library(phangorn)

extract.clade2 <- function(phy, node){
if(!(node %in% phy$edge[,1])) stop("node number must be greater than
the number of tips")
drop.tip(phy, setdiff(1:Ntip(phy), Descendants(phy, node)[[1]]))
}

Klaus



On Jan 5, 2017 10:25 PM, "Joseph W. Brown" <josep...@umich.edu
<mailto:josep...@umich.edu>> wrote:

Hmm. Maybe something wonky with your tree? I simulated a 17-tip tree
and tried things out:

phy <- rtree(17);
rootID <- length(phy$tip.label) + 1;
counter <- 1;
for (i in rootID:max(phy$edge[,1])) {
  clade <- extract.clade(phy, i);
  # do something. just printing clade properties here
  print(paste0("clade ", counter, " (node ", i, ") has ",
length(clade$tip.label), " tips."));
  counter <- counter + 1;
}

[1] "clade 1 (node 18) has 17 tips."
[1] "clade 2 (node 19) has 11 tips."
[1] "clade 3 (node 20) has 10 tips."
[1] "clade 4 (node 21) has 4 tips."
[1] "clade 5 (node 22) has 3 tips."
[1] "clade 6 (node 23) has 2 tips."
[1] "clade 7 (node 24) has 6 tips."
[1] "clade 8 (node 25) has 5 tips."
[1] "clade 9 (node 26) has 3 tips."
[1] "clade 10 (node 27) has 2 tips."
[1] "clade 11 (node 28) has 2 tips."
[1] "clade 12 (node 29) has 6 tips."
[1] "clade 13 (node 30) has 2 tips."
[1] "clade 14 (node 31) has 4 tips."
[1] "clade 15 (node 32) has 3 tips."
[1] "clade 16 (node 33) has 2 tips."

and had no problem. Maybe post your tree here?

JWB

Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu <mailto:josep...@umich.edu>



> On 5 Jan, 2017, at 17:18, Kamila Naxerova <knaxer...@partners.org
<mailto:knaxer...@partners.org>> wrote:
>
> Dear Joseph,
>
> thanks so much. This is exactly what I need!
>
> I am running into some problems that I don’t understand though. In
my case, rootID is 18, and max(phy$edge[,1]) is 33. When I try to
execute your loop, this happens:
>
> > extract.clade(phy, 18)
>
> Phylogenetic tree with 17 tips and 16 internal nodes.
>
> Tip labels:
>   X1, X8, X9, X10 ...
>
> Rooted; includes branch lengths.
>
> So far so good… but then I keep getting these errors:
>
> > extract.clade(phy, 19)
> Error in phy$edge[, 2] : incorrect number of dimensions
> > extract.clade(phy, 20)
> Error in phy$edge[, 2] : incorrect number of dimensions
>
> Not sure why extract.clade produces these errors. 19-23 do

Re: [R-sig-phylo] Extract all possible clades from a tree

2017-01-06 Thread Emmanuel Paradis


Hi,

Klaus is right: extract.clade() fails if the tree has been rooted with 
resolve.root = TRUE before. I'm going to rewrite the function calling 
drop.tip (which will be safer). In the meantime, Klaus's function should 
work for Kamila.


Best,

Emmanuel

Le 06/01/2017 à 05:17, Klaus Schliep a écrit :

I had a look at the tree. There seems a bug in extract.clades, when the
tree was rooted before with the root() function. extract.clade() seem to
expect some ordering of nodes which seems not satisfied (and it looks as
it was written a long time ago).

I just wrote a small function (easier than to debug the old one) which
should do the same and it seems to work:

library(phangorn)

extract.clade2 <- function(phy, node){
if(!(node %in% phy$edge[,1])) stop("node number must be greater than
the number of tips")
drop.tip(phy, setdiff(1:Ntip(phy), Descendants(phy, node)[[1]]))
}

Klaus



On Jan 5, 2017 10:25 PM, "Joseph W. Brown" > wrote:

Hmm. Maybe something wonky with your tree? I simulated a 17-tip tree
and tried things out:

phy <- rtree(17);
rootID <- length(phy$tip.label) + 1;
counter <- 1;
for (i in rootID:max(phy$edge[,1])) {
  clade <- extract.clade(phy, i);
  # do something. just printing clade properties here
  print(paste0("clade ", counter, " (node ", i, ") has ",
length(clade$tip.label), " tips."));
  counter <- counter + 1;
}

[1] "clade 1 (node 18) has 17 tips."
[1] "clade 2 (node 19) has 11 tips."
[1] "clade 3 (node 20) has 10 tips."
[1] "clade 4 (node 21) has 4 tips."
[1] "clade 5 (node 22) has 3 tips."
[1] "clade 6 (node 23) has 2 tips."
[1] "clade 7 (node 24) has 6 tips."
[1] "clade 8 (node 25) has 5 tips."
[1] "clade 9 (node 26) has 3 tips."
[1] "clade 10 (node 27) has 2 tips."
[1] "clade 11 (node 28) has 2 tips."
[1] "clade 12 (node 29) has 6 tips."
[1] "clade 13 (node 30) has 2 tips."
[1] "clade 14 (node 31) has 4 tips."
[1] "clade 15 (node 32) has 3 tips."
[1] "clade 16 (node 33) has 2 tips."

and had no problem. Maybe post your tree here?

JWB

Joseph W. Brown
Post-doctoral Researcher, Smith Laboratory
University of Michigan
Department of Ecology & Evolutionary Biology
Room 2071, Kraus Natural Sciences Building
Ann Arbor MI 48109-1079
josep...@umich.edu 



> On 5 Jan, 2017, at 17:18, Kamila Naxerova > wrote:
>
> Dear Joseph,
>
> thanks so much. This is exactly what I need!
>
> I am running into some problems that I don’t understand though. In
my case, rootID is 18, and max(phy$edge[,1]) is 33. When I try to
execute your loop, this happens:
>
> > extract.clade(phy, 18)
>
> Phylogenetic tree with 17 tips and 16 internal nodes.
>
> Tip labels:
>   X1, X8, X9, X10 ...
>
> Rooted; includes branch lengths.
>
> So far so good… but then I keep getting these errors:
>
> > extract.clade(phy, 19)
> Error in phy$edge[, 2] : incorrect number of dimensions
> > extract.clade(phy, 20)
> Error in phy$edge[, 2] : incorrect number of dimensions
>
> Not sure why extract.clade produces these errors. 19-23 don’t
work, 24-26 work, 27 produces the error again, 28 works etc.
>
> Thanks again.
>
> Kamila
>
>
>> On Jan 5, 2017, at 4:12 PM, Joseph W. Brown  >> wrote:
>>
>> Not sure if I understand the problem completely, but this should
allow you to examine all of the clades (and should work if
polytomies are involved):
>>
>> # for tree phy
>> rootID <- length(phy$tip.label) + 1;
>> for (i in rootID:max(phy$edge[,1])) {
>>   clade <- extract.clade(phy, i);
>>   # do something
>> }
>>
>> This includes the root node (i.e. whole tree), but that can be
changed. This can be rewritten as an lapply if necessary.
>>
>> HTH.
>> JWB
>> 
>> Joseph W. Brown
>> Post-doctoral Researcher, Smith Laboratory
>> University of Michigan
>> Department of Ecology & Evolutionary Biology
>> Room 2071, Kraus Natural Sciences Building
>> Ann Arbor MI 48109-1079
>> josep...@umich.edu 
>
>>
>>
>>
>>> On 5 Jan, 2017, at 15:50, Kamila Naxerova

>> wrote:
>>>
>>> Hi all,
>>>
>>> I would like to break a phylogenetic tree into all possible
clades and then examine each one of them for

Re: [R-sig-phylo] Tree plotting (and maybe others) are borked when a node label is '0'

2017-01-04 Thread Emmanuel Paradis


Hi,

This is a known feature of read.nexus(), see ape's FAQ:

http://ape-package.ird.fr/ape_faq.html#Bayestrees

This explains how to fix the problem.

Best,

Emmanuel

Le 04/01/2017 à 14:46, Yan Wong a écrit :

If I create a test.nex file as follows:

#NEXUS
BEGIN TREES;
TRANSLATE
0 0,
1 1,
2 2,
3 3,
4 4,
5 5,
6 6,
7 7;
TREE test = (((3,5),(4,6)),(0,(2,(1,7;
END;

And then attempt to do

plot(read.nexus("test.nex"))


I get
Error in plot.phylo(m) :
  tree badly conformed; cannot plot. Check the edge matrix.

It works fine if I start indexing tips at 1.

Yan

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : 
http://f.security-mail.net/3010nZ2hH3V




___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] read.FASTA accept connection

2017-01-04 Thread Emmanuel Paradis

This is great! This can also read other types of connection. I made a 
few modifications so that the function can read remote files through 
HTTPS or FTPS and this fixes the crash when reading empty files (or 
compressed files without specifying a connection). For this fix, it is 
needed to recompile ape, but the modified read.FASTA (pasted below) can 
be used without recompilation.


Cheers,

Emmanuel


read.FASTA <- function(file)
{
if (length(grep("^(ht|f)tp(s|):", file))) {
url <- file
file <- tempfile()
download.file(url, file)
}
if (is(file, "connection")) {
if (!isOpen(file, "rt")) {
open(file, "rt")
on.exit(close(file))
}
x <- scan(file, what = character(), sep = "\n", quiet = TRUE)
x <- charToRaw(paste(x, collapse = "\n"))
sz <- length(x)
} else {
sz <- file.size(file)
x <- readBin(file, "raw", sz)
}
## if the file is larger than 1 Gb we assume that it is
## UNIX-encoded and skip the search-replace of carriage returns
if (sz < 1e9) {
icr <- which(x == as.raw(0x0d)) # CR
if (length(icr)) x <- x[-icr]
}
res <- .Call("rawStreamToDNAbin", x)
if (identical(res, 0L)) {
warning("failed to read sequences, returns NULL")
return(NULL)
}
names(res) <- sub("^ +", "", names(res)) # to permit phylosim
class(res) <- "DNAbin"
res
}

Le 04/01/2017 à 00:37, Rj Ewing a écrit :

Emmanuel,

Thanks, I've attached a patch containing the changes.

I'm not very familiar with R, so maybe there is a better way to do it.
Let me know if you want me to make any changes.

Thanks,
RJ

On Tue, Jan 3, 2017 at 12:53 AM, Emmanuel Paradis
<emmanuel.para...@ird.fr <mailto:emmanuel.para...@ird.fr>> wrote:

Hi,

You can send your contribution directly to me. See this post for a
recent discussion on how to contribute to ape:

http://www.mail-archive.com/r-sig-phylo@r-project.org/msg04580.htmlra 
<http://www.mail-archive.com/r-sig-phylo@r-project.org/msg04580.htmlra>

Sure your contribution will be useful. I've just found out that
read.FASTA crashes R if the FASTA file is gz'ed. It will be the
opportunity to fix this too.

Best,

Emmanuel


Le 02/01/2017 à 19:26, Rj Ewing a écrit :

Is there a way to contribute to the project? I would like to add the
ability for read.FASTA to accept a connection or a file.

I have a zip file that I would rather not unzip before passing into
read.FASTA. I can obtain a connection using unz("file.zip",
"myFasta.fasta"), however read.FASTA does not support this.

Thanks,
RJ

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
<mailto:R-sig-phylo@r-project.org>
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
<https://stat.ethz.ch/mailman/listinfo/r-sig-phylo>
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/
<http://www.mail-archive.com/r-sig-phylo@r-project.org/>


Pour nous remonter une erreur de filtrage, veuillez vous rendre
ici : http://f.security-mail.net/302UOvTyxTd
<http://f.security-mail.net/302UOvTyxTd>







___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] read.FASTA accept connection

2017-01-03 Thread Emmanuel Paradis


Hi,

You can send your contribution directly to me. See this post for a 
recent discussion on how to contribute to ape:


http://www.mail-archive.com/r-sig-phylo@r-project.org/msg04580.htmlra

Sure your contribution will be useful. I've just found out that 
read.FASTA crashes R if the FASTA file is gz'ed. It will be the 
opportunity to fix this too.


Best,

Emmanuel

Le 02/01/2017 à 19:26, Rj Ewing a écrit :

Is there a way to contribute to the project? I would like to add the
ability for read.FASTA to accept a connection or a file.

I have a zip file that I would rather not unzip before passing into
read.FASTA. I can obtain a connection using unz("file.zip",
"myFasta.fasta"), however read.FASTA does not support this.

Thanks,
RJ

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : 
http://f.security-mail.net/302UOvTyxTd




___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Force read.nexus() to return a multiPhylo object, even if there is only one tree

2016-12-28 Thread Emmanuel Paradis


Hi Liam,

Actually, this can be done with c():

R> tr <- rtree(10)
R> c(tr)
1 phylogenetic trees

With respect to Yan's initial query, this will not restore the tree name 
from the NEXUS file which is (currently) lost if there is a single tree.


Best,

Emmanuel

Le 28/12/2016 à 03:11, Liam J. Revell a écrit :

Hi Yan.

In case it is useful I just pushed an update to phytools in which I have
added an as.multiPhylo method for objects of class "phylo". This will
also return an object of class "multiPhylo" if that is already the
object class. This means that:

obj<-as.multiPhylo(read.nexus(...))

in which ... are the arguments to read.nexus, will return an object of
class "multiPhylo" with length 1 if the file contains only one tree -
but will also return an object of class "multiPhylo" if the file has 2
or more trees.

To get this one needs to install phytools from GitHub using devtools:

library(devtools)
install_github("liamrevell/phytools")

All the best, Liam

Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 12/27/2016 5:11 PM, Yan Wong wrote:


On 27 Dec 2016, at 22:08, Emmanuel Paradis <emmanuel.para...@ird.fr>
wrote:


Hi Yan,

Yes, you are right: this modification can be done. In the meantime,
you can fix the code with:

fix(read.nexus)

Find this line (#117):

   if (Ntree == 1) {

and change it to:

   if (FALSE) {

save and close. Tell me if you still have problem.


Yes, that works fine. Thanks a lot.

Yan
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/







___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Force read.nexus() to return a multiPhylo object, even if there is only one tree

2016-12-27 Thread Emmanuel Paradis


Hi Yan,

Yes, you are right: this modification can be done. In the meantime, you 
can fix the code with:


fix(read.nexus)

Find this line (#117):

if (Ntree == 1) {

and change it to:

if (FALSE) {

save and close. Tell me if you still have problem.

Best,

Emmanuel

Le 27/12/2016 à 21:15, Yan Wong a écrit :

Hi

I’m doing some analysis in which I produce a variable number of phylogenetic 
trees with the same tips in a nexus file, then import them into R for analysis 
(these are trees along a genome). The name of each tree is meaningful (it is a 
position along the genome). However, in a few simulations, the nexus file has 
only a single tree in it. In this case, read.nexus returns not a multiPhylo 
object, but a single tree, with the name of the tree lost. I wonder if it would 
be sensible to have a force.multi option to read.nexus(), by default set to 
FALSE (to retain compatibility), but which, if set to TRUE, returns a 
multiPhylo object even if there is only one tree in the file?

Yan
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Pour nous remonter une erreur de filtrage, veuillez vous rendre ici : 
http://f.security-mail.net/407pvJ9zSVE




___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

1 2 3 4 >

1 - 100 of 372 matches

Mail list logo