Hello!

2018-04-15 17:21 GMT+02:00 'Ravi Arya' via elasticluster
<elasticluster@googlegroups.com>:
> 1. Load balancing: You submit the number of jobs to the cluster and
> depending upon the number of jobs, cluster adds or removes the nodes. This
> is automatic and is accomplished by running load balancer in parallel. In
> this way, users can go on submitting the jobs to the head/master node and
> there is no need to manually add/remove nodes.
>
> Reference: http://star.mit.edu/cluster/docs/0.95.6/manual/load_balancer.html

Short answer: ElastiCluster provides no such option for dynamically
growing or shrinking a cluster at the moment.  The closest you could
get to replicating Star Cluster's feature for Grid Engine is to
combine ElastiCluster with https://github.com/uzh/vm-mad (warning: the
code is old and might require some changes, let me know if you're
really interested).

Longer answer: There's a major architectural difference between
ElastiCluster and StarCluster, that make automatically growing or
shrinking a cluster a rather different challenge: ElastiCluster works
with base OS images, whereas (to my understanding) StarCluster
requires ad-hoc pre-built AMIs.  This means that ElastiCluster
deployments are generally slower, which caps the number of nodes that
can be quickly added to a cluster (people have tried this in order to
leverage spot instances on AWS or the equivalent feature on Google
Cloud and the consensus is that it's generally too slow ATM). More
discussion and details on future plans at
https://github.com/gc3-uzh-ch/elasticluster/issues/365

At the moment, the best workaround would be something like the following:

* Start a "prototype" cluster
* Snaphsot nodes of the prototype cluster, one snapshot per kind
(e.g., one snapshot for the front-end and one for the compute nodes)
* Make a new config using the snapshots as disk images
* Start the dynamic cluster
* Use any monitor script to grow/shrink the cluster on demand by
running `elasticluster resize --add` and `elasticluster remove-node`
(again, https://github.com/uzh/vm-mad can do this for Grid Engine;
SLURM has built-in support for this; other systems may require new
code)

I know this is rather kludgy and brittle, but no-one has stepped up to
sponsor wokring on an "autoscaling" feature yet ;-)


> 2. I tried again with prebuilt AMI (by building it again) and it is working 
> now.

Good! All is well that ends well :-)

Ciao,
R

-- 
You received this message because you are subscribed to the Google Groups 
"elasticluster" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to elasticluster+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to