Hey Andy,

Financially, it's probably best to start small cluster-wise.  What I'd probably 
recommend for your particular project would be using a single m1.xlarge 
instance as the head node, seeing how that goes, and then adding workers as you 
find it useful.  Should you find that it isn't enough, it's trivially easy 
using cloudman to shut the cluster down and restart with a much larger AMI.

Regarding extra nodes -- Given that you're a single user, in serial pipelines 
and workflows where each job depends on the previous one it isn't useful to 
have extra instances at all and you'd only waste money.  If your analysis can 
be done in parallel, however, say you have multiple samples all requiring the 
same basic preliminary steps, then extra nodes can definitely help get the work 
done much faster.  You could also use cloudman's autoscaling to handle this; it 
would automatically scale up the cluster (while adhering to your min/max 
parameters) as necessary to process jobs as fast as possible while trimming any 
idle nodes to prevent waste.

Lastly, depending on the analysis you need to do, you may find you need a high 
memory instance.  In this case (given your m1.xlarge head node) you can either 
restart your instance using the larger node, or even simpler disable job 
running on the head node in the interface and add a high memory worker instance 
temporarily to handle the special demand.

Let me know if there's anything else I can do to help!

-Dannon



On Jan 23, 2013, at 3:40 PM, Andrew Norman <anorma...@gmail.com> wrote:

> Hi all
> 
> I'd like to use AWS EC2 cluster to run Galaxy TopHat to analyze the RNA seq 
> data for my project. I'm trying to price out the resources that I'll need to 
> do this, but I don't have any experience setting up clusters, virtual or 
> real, so I'd like to get the insight of someone who has done this. I have 
> studied the wiki page dedicated to this topic 
> (http://wiki.galaxyproject.org/CloudMan/AWS/CapacityPlanning) but I don't 
> know how many worker nodes I'll need, and I will just be doing my personal 
> analysis with this cluster (only one TopHat analysis at a time). 
> 
> Can anyone help me out with this? Thanks!
> 
> Andy
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>  http://lists.bx.psu.edu/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to