[R-sig-phylo] Running R on a Computer Cluster in the Cloud - cloudnumbers.com

2011-07-13 Thread Markus Schmidberger
Dear phylogenetic and comparative methods experts,

cloudnumbers.com provides researchers and companies with the resources
to perform high performance calculations in the cloud. As
cloudnumbers.com's community manager I may invite you to register and
test R on a computer cluster in the cloud for free:
http://my.cloudnumbers.com/register

Our aim is to change the way of research collaboration is done today by
bringing together scientists and businesses from all over the world on a
single platform. cloudnumbers.com is a Berlin (Germany) based
international high-tech startup striving for enabling everyone to
benefit from the High Performance Computing related advantages of the
cloud. We provide easy access to applications running on any kind of
computer hardware: from single core high memory machines up to 1000
cores computer clusters.

Our platform provides several advantages:

* Turn fixed into variable costs and pay only for the capacity you need.
Watch our latest saving costs with cloudnumbers.com video:
http://www.youtube.com/watch?v=ln_BSVigUhgfeature=player_embedded

* Enter the cloud using an intuitive and user friendly platform. Watch
our latest cloudnumbers.com in a nutshell video:
http://www.youtube.com/watch?v=0ZNEpR_ElV0feature=player_embedded

* Be released from ongoing technological obsolescence and continuous
maintenance costs (e.g. linking to libraries or system dependencies)

* Accelerated your R, C, C++, Fortran, Python, ... calculations through
parallel processing and great computing capacity - more than 1000 cores
are available and GPUs are coming soon.

* Share your results worldwide (coming soon).

* Get high speed access to public databases (please let us know, if your
favorite database is missing!).

* We have developed a security architecture that meets high requirements
of data security and privacy. Read our security white paper:
http://d1372nki7bx5yg.cloudfront.net/wp-content/uploads/2011/06/cloudnumberscom-security.whitepaper.pdf


This is only a selection of our top features. To get more information
check out our web-page (http://www.cloudnumbers.com/) or follow our blog
about cloud computing, HPC and HPC applications (with R):
http://cloudnumbers.com/blog

Register and test for free now at cloudnumbers.com:
http://my.cloudnumbers.com/register

We are looking forward to get your feedback and consumer insights. Take
the chance and have an impact to the development of a new cloud
computing calculation platform.

Best
Markus


-- 
Dr. rer. nat. Markus Schmidberger 
Senior Community Manager 

Cloudnumbers.com GmbH
Chausseestraße 6
10119 Berlin 

www.cloudnumbers.com 
E-Mail: markus.schmidber...@cloudnumbers.com 


* 
Amtsgericht München, HRB 191138 
Geschäftsführer: Erik Muttersbach, Markus Fensterer, Moritz v. 
Petersdorff-Campen

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] cut a tree in a given time interval

2011-07-13 Thread Nick Matzke

Oops, left that out.  Here it is as text and an attached file.

You will have to do
library(ape)

and maybe

library(phangorn)

...ahead of time (hopefully not others...)





chainsaw - function(tr, timepoint=10)
{
# Take a tree and saw it off evenly across a certain 
timepoint.
# This removes any tips above the timepoint, and 
replaces them

# with a single tip representing the lineage crossing
# the timepoint (with a new tip name).

# Get the tree in a table
tr_table = prt(tr, printflag=FALSE)

# Find the tips that are less than 10 my old and drop them
TF_exists_more_recently_than_10mya = tr_table$time_bp  
timepoint


# Get the corresponding labels
labels_for_tips_existing_more_recently_than_10mya = 
tr_table$label[ TF_exists_more_recently_than_10mya == TRUE ]


###
# Draft chainsaw function
###
# loop through the branches that cross 10 mya

# get a list of the edge start/stops in the phylogeny's 
edges

edge_times_bp = get_edge_times_before_present(tr)

# which of these branches cross 10 mya?
edges_start_earlier_than_10mya = edge_times_bp[, 1]  
timepoint
edges_end_later_than_10mya = edge_times_bp[, 2] = 
timepoint
edges_to_chainsaw = edges_start_earlier_than_10mya + 
edges_end_later_than_10mya == 2


# then, for each of these edges, figure out how many 
tips exist descending from it

nodes_to_chainsaw = tr$edge[, 2][edges_to_chainsaw]
nodes_to_chainsaw = nodes_to_chainsaw[nodes_to_chainsaw 
 length(tr$tip.label)]


# create a copy of the tree to chainsaw
tree_to_chainsaw = tr

for (i in 1:length(nodes_to_chainsaw))
{
tmp_subtree = extract.clade(tr, nodes_to_chainsaw[i])
#print(tmp_subtree$tip.label)

tmp_number_of_tips = length(tmp_subtree$tip.label)
#print(tmp_number_of_tips)

# number of tips to drop = (numtips -1)
numtips_to_drop = tmp_number_of_tips - 1

# tips_to_drop
tmp_labels = tmp_subtree$tip.label

labels_to_drop = tmp_labels[1:numtips_to_drop]

# new label
label_kept_num = length(tmp_labels)
label_kept = tmp_labels[label_kept_num]
new_label = paste(CA_, label_kept, +, 
numtips_to_drop, _tips, sep=)


tree_to_chainsaw$tip.label[tree_to_chainsaw$tip.label == 
label_kept] = new_label


# chop off e.g. 2 of the 3 tips
tree_to_chainsaw = drop.tip(tree_to_chainsaw, 
labels_to_drop)


}
#plot(tree_to_chainsaw)
#axisPhylo()

tree_to_chainsaw_table = prt(tree_to_chainsaw, 
printflag=FALSE)


tree_to_chainsaw_table_tips_TF_time_bp_LT_10my = 
tree_to_chainsaw_table$time_bp  timepoint



tmp_edge_lengths = 
tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my]


times_bp_for_edges_to_chainsaw = 
tree_to_chainsaw_table$time_bp[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my]


adjustment = times_bp_for_edges_to_chainsaw - timepoint

revised_tmp_edge_lengths = tmp_edge_lengths + adjustment


tree_to_chainsaw_table$edge.length[tree_to_chainsaw_table_tips_TF_time_bp_LT_10my] 
= revised_tmp_edge_lengths


# revised
ordered_nodenames = get_nodenums(tree_to_chainsaw)
parent_branches = 
get_indices_where_list1_occurs_in_list2(ordered_nodenames, 
tree_to_chainsaw$edge[,2])


NA_false = is.not.na(tree_to_chainsaw_table$edge.length)

tree_to_chainsaw$edge.length[parent_branches[NA_false]] 
= tree_to_chainsaw_table$edge.length[NA_false]


return(tree_to_chainsaw)
}


# print tree in hierarchical format
prt - function(t, printflag=TRUE, relabel_nodes = FALSE, 
time_bp_digits=7)

{
# assemble beginning table

# check if internal node labels exist
if (node.label %in% attributes(t)$names == FALSE)
{
rootnum = get_nodenum_structural_root(t)

new_node_labels = paste(inNode, 
rootnum:(rootnum+t$Nnode-1), sep=)

t$node.label = new_node_labels
}

# or manually relabel the internal nodes, if desired
if (relabel_nodes == TRUE)
{
rootnum = get_nodenum_structural_root(t)

new_node_labels = paste(inNode, 
rootnum:(rootnum+t$Nnode-1), sep=)

t$node.label = new_node_labels
}

labels = c(t$tip.label, t$node.label)
ordered_nodenames = get_nodenums(t)
#nodenums = 1:length(labels)
node.types1 = rep(tip, length(t$tip.label))
node.types2 = rep(internal, length(t$node.label))
node.types2[1] = root
node.types = c(node.types1, node.types2)

# These are the index numbers of the edges below each node
parent_branches = 
get_indices_where_list1_occurs_in_list2(ordered_nodenames, 
t$edge[,2])

#parent_edges = parent_branches
brlen_to_parent = t$edge.length[parent_branches]

parent_nodes = t$edge[,1][parent_branches]
daughter_nodes = lapply(ordered_nodenames, 

[R-sig-phylo] pic() vs gls()

2011-07-13 Thread Nicholas Mason
Dear R-sig-phylo users,I have a question regarding comparative analyses of contrasts done with the functions fitContinuous() and pic() compared to using PGLS (using the gls() function).From my understanding the first method involving pic() below fits alpha (estimated using fitContinuous()) to each character independently then performs a regression on the resulting contrasts.The second (PGLS) method involving gls() fits both the regression and contrasts simultaneously and reports a single alpha value for the relationship between two traits.These two methods appear very similar, yet I get slightly different results between the two functions, particularly with respect to p-values. Using fitContinuous(), the results from the data set attached are r2 = 0.075 p = 0.091. Using gls(): r2 = 0.079 p =0.0519. Furthermore, if I switch the dependent and independent variables (V1~V2 vs V2~V1), I get the same answer with pic(), but gls() differs: r2 = -0.069 p =0.02 (see pglsModel1a vs pglsModel1b in the attachment).Could anyone explain why these functions vary (in my mind they were doing essentially the same thing) and if there are situations where one should be favored over another? Also, why does PGLS vary if you switch the dependent/independent variables in the linear model?Thanks in advance for any advice or comments offered!!Cheers,Nick MasonBelow I have the code I have been using (also attached as Mason.R):require(ape)require(nlme)require(geiger)read.csv(file="Mason_data.csv")-smdatarownames(smdata)-smdata[,1]smdata[,1]-NULL#ASIDE: If anyone could tell me how to get around these two lines of code, that would also be awesome. Header=T doesn't seem to work for me.read.tree(file="Mason_tree.tre")-spbmname.check(smdata,spbm)fitContinuous(spbm,smdata,model="OU")-ou2pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=ou2$Pause_Rate$alpha))pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=ou2$Comp.1$alpha))summary(lm(pr_contrast~pc1_contrast-1))pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1, spbm),data=smdata)summary(pglsModel1a)pglsModel1b-gls(Comp.1~Pause_Rate, correlation=corMartins(1, spbm),data=smdata)summary(pglsModel1b)

Mason_tree.tre
Description: Binary data
,Pause_Rate,Comp.1,Comp.2
Sporophila_schistacea,0.97572108797618,1.3796099841333,-1.12587956726316
Sporophila_intermedia,0.314864247370089,2.36946494397968,1.16313695174602
Sporophila_plumbea,0.525348288841447,2.71609091582214,-0.0201502845760047
Sporophila_torqueola,0.090114188687708,3.55768691291833,0.0588440560485339
Sporophila_collaris,0.258598106973663,-0.866576621272912,-0.375459372210033
Sporophila_lineola,1.91924407069217,5.38973476610418,-0.719604956651293
Sporophila_luctuosa,0.316867010852651,3.51591652591355,-1.01122149539569
Sporophila_nigricollis,0.646766139216069,4.60967961843748,0.403572095059875
Sporophila_caerulescens,1.19967606461856,4.42942981620688,0.266281245585206
Sporophila_albogularis,-0.747926431242122,2.14986795813319,0.453292241973436
Sporophila_peruviana,-0.662055276873553,-5.75668334764164,0.194910239070253
Sporophila_simplex,-0.0159932866483984,0.50911606352083,-0.961689651320771
Sporophila_bouvreuil,-0.113798179556032,4.96419876880559,-0.0444962146993295
Sporophila_minuta,-0.653039978630658,4.51909264308507,0.271433436605055
Sporophila_hypoxantha,-1.33520075061216,3.6424747461571,-0.143885556798834
Sporophila_ruficollis,-1.10188388209151,3.69620160094318,-1.122292891
Sporophila_castaneiventris,-0.957617880681526,5.13264024147439,0.71993032372333
Sporophila_hypochroma,-0.860526127835558,3.70604616959997,-0.317101374278479
Sporophila_cinnamomea,-0.859253764452833,4.92903311573274,-0.397590156130398
Sporophila_melanogaster,0.246280339669374,3.57517426978125,0.492140024454993
Sporophila_telasco,0.512066210743585,4.57687350229544,0.151380525693064
Oryzoborus_nuttingi,-0.803111018941645,-16.0737467067366,-1.53209487276288
Oryzoborus_crassirostris,-0.57221657664358,-10.5639148825931,1.92861328894315
Oryzoborus_atrirostris,-0.871304566495871,-18.0081579920927,-0.767118053337893
Oryzoborus_maximiliani,-0.326024013387329,-14.0150572642919,-0.0204930609520138
Oryzoborus_angolensis,-0.142855044536879,-6.48632361126553,0.96227143885814
Dolospingus_fringilloides,1.26961240149379,-2.93723705653849,1.74362516139737


Mason.R
Description: Binary data

-Nicholas Albert MasonMS Candidate, Burns LabDepartment of Biology - EB ProgramSan Diego State University5500 Campanile DriveSan Diego, CA 92182-4614845-240-0649 (cell)

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] pic() vs gls()

2011-07-13 Thread Liam J. Revell

Hi Nick.

For your first (simple) problem, I believe you want to do:

 read.csv(file=Mason_data.csv,row.names=1)-smdata

Regarding the more complicated issue, the problem of non-independence in 
linear regression comes in the residual error of the model.  Thus, you 
should fit a correlation structure for the error in Y given X (not for X 
 Y separately as you have done with PICs here).  This indeed can be 
done using gls() - so, for instance:


pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1, 
spbm),data=smdata)


fits a linear model in which the residual error of Pause_Rate given 
Comp.1 is distributed according to the correlation structure 
corMartins() with alpha estimated using REML.


The way to reproduce this result using contrasts is the following:

 alpha-attr(pglsModel1a$apVar,Pars)[corStruct] # get fitted alpha
 smdata-as.matrix(smdata) # important
 pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=alpha))
 pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=alpha))
 summary(lm(pr_contrast~pc1_contrast-1))

Which if you compare to:

 summary(pglsModel1a)

you should find yields the same results and P-value.  (Note that 
smdata-as.matrix(smdata) seems to be important here as if smdata is a 
data frame, then smdata[,1] does not inherit the rownames of smdata and 
pic() will return an incorrect set of contrasts.)


I hope this helps.  No doubt other subscribers to the list may have 
something to add.


Best, Liam

--
Liam J. Revell
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://phytools.blogspot.com

On 7/13/2011 11:07 PM, Nicholas Mason wrote:

Dear R-sig-phylo users,
I have a question regarding comparative analyses of contrasts done with
the functions fitContinuous() and pic() compared to using PGLS (using
the gls() function).

 From my understanding the first method involving pic() below fits alpha
(estimated using fitContinuous()) to each character independently then
performs a regression on the resulting contrasts.

The second (PGLS) method involving gls() fits both the regression and
contrasts simultaneously and reports a single alpha value for the
relationship between two traits.

These two methods appear very similar, yet I get slightly different
results between the two functions, particularly with respect to
p-values. Using fitContinuous(), the results from the data set attached
are r2 = 0.075 p = 0.091. Using gls(): r2 = 0.079 p =0.0519.
Furthermore, if I switch the dependent and independent variables (V1~V2
vs V2~V1), I get the same answer with pic(), but gls() differs: r2 =
-0.069 p =0.02 (see pglsModel1a vs pglsModel1b in the attachment).

Could anyone explain why these functions vary (in my mind they were
doing essentially the same thing) and if there are situations where one
should be favored over another? Also, why does PGLS vary if you switch
the dependent/independent variables in the linear model?

Thanks in advance for any advice or comments offered!!

Cheers,
Nick Mason

Below I have the code I have been using (also attached as Mason.R):

require(ape)
require(nlme)
require(geiger)
read.csv(file=Mason_data.csv)-smdata
rownames(smdata)-smdata[,1]
smdata[,1]-NULL#ASIDE: If anyone could tell me how to get around these
two lines of code, that would also be awesome. Header=T doesn't seem to
work for me.
read.tree(file=Mason_tree.tre)-spbm
name.check(smdata,spbm)

fitContinuous(spbm,smdata,model=OU)-ou2

pr_contrast-pic(smdata[,1],ouTree(spbm,alpha=ou2$Pause_Rate$alpha))
pc1_contrast-pic(smdata[,2],ouTree(spbm,alpha=ou2$Comp.1$alpha))

summary(lm(pr_contrast~pc1_contrast-1))

pglsModel1a-gls(Pause_Rate~Comp.1, correlation=corMartins(1,
spbm),data=smdata)
summary(pglsModel1a)

pglsModel1b-gls(Comp.1~Pause_Rate, correlation=corMartins(1,
spbm),data=smdata)
summary(pglsModel1b)








-
Nicholas Albert Mason
MS Candidate, Burns Lab
Department of Biology - EB Program
San Diego State University
5500 Campanile Drive
San Diego, CA 92182-4614
845-240-0649 (cell)





___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Assigning node ages to a tree, revisited

2011-07-13 Thread Scott Chamberlain
Hi Roger, 

Can you provide a reproducible example (perhaps a subset of your tree and their 
tip and node ages if the tree is very large)? I don't know what the problem 
could be without the data, but perhaps Gene knows? 


Scott

On Wednesday, July 13, 2011 at 10:57 PM, Roger Close wrote:

 Hello all,
 
 I wish to transform branch lengths on a tree according to ages of
 terminal taxa and internal node ages; to this end I have tried to
 implement the script written by Gene Hunt mentioned in a recent post
 to this list 
 (http://www.mail-archive.com/r-sig-phylo@r-project.org/msg01005.html;
 script available at https://gist.github.com/938313).
 
 However, when attempting to run the command scalePhylo(tr, tip.ages,
 node.mins), it crunches away for hours on my computer (a 2010 MacBook
 Pro Core i7, which is quite fast). Clearly there's something wrong.
 I'm afraid I don't know enough about R to debug a script.
 
 I thought I'd send this to the list in case someone other than Gene
 Hunt or Scott Chamberlain has had a go at using this script.
 
 Thanks in advance,
 
 Roger


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo