Financially, it's probably best to start small cluster-wise. What I'd probably
recommend for your particular project would be using a single m1.xlarge
instance as the head node, seeing how that goes, and then adding workers as you
find it useful. Should you find that it isn't enough, it's trivially easy
using cloudman to shut the cluster down and restart with a much larger AMI.
Regarding extra nodes -- Given that you're a single user, in serial pipelines
and workflows where each job depends on the previous one it isn't useful to
have extra instances at all and you'd only waste money. If your analysis can
be done in parallel, however, say you have multiple samples all requiring the
same basic preliminary steps, then extra nodes can definitely help get the work
done much faster. You could also use cloudman's autoscaling to handle this; it
would automatically scale up the cluster (while adhering to your min/max
parameters) as necessary to process jobs as fast as possible while trimming any
idle nodes to prevent waste.
Lastly, depending on the analysis you need to do, you may find you need a high
memory instance. In this case (given your m1.xlarge head node) you can either
restart your instance using the larger node, or even simpler disable job
running on the head node in the interface and add a high memory worker instance
temporarily to handle the special demand.
Let me know if there's anything else I can do to help!
On Jan 23, 2013, at 3:40 PM, Andrew Norman <anorma...@gmail.com> wrote:
> Hi all
> I'd like to use AWS EC2 cluster to run Galaxy TopHat to analyze the RNA seq
> data for my project. I'm trying to price out the resources that I'll need to
> do this, but I don't have any experience setting up clusters, virtual or
> real, so I'd like to get the insight of someone who has done this. I have
> studied the wiki page dedicated to this topic
> (http://wiki.galaxyproject.org/CloudMan/AWS/CapacityPlanning) but I don't
> know how many worker nodes I'll need, and I will just be doing my personal
> analysis with this cluster (only one TopHat analysis at a time).
> Can anyone help me out with this? Thanks!
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at: