Dear Ana, (Ana Jokanović, Tue, Jan 17, 2017 at 06:34:44AM -0800:) > I understand that ElastiCluster installs SLURM on the newly created > cluster.
Actually, ElastiCluster just runs Ansible on the cluster; depending on the "groups" that your config file defines for a node, that node will get software installed and configured on it. That is to say, you could install SLURM+GlusterFS or GridEngine+Hadoop+Ganglia if you found that combination useful. > Is SLURM source code within ElastiCluster source code? Where can I > find it? No, ElastiCluster installs SLURM from pre-compiled packages. The are the SLURM packages that come from the distribution's main archive (Debian, Ubuntu), or the packages from the SLURM COPR [1] by @verdurin on CentOS/RHEL. [1]: https://copr.fedorainfracloud.org/coprs/verdurin/slurm/ In a former release, ElastiCluster was downloading the SLURM sources from the SchedMD website and compiling them. This makes configuring a cluster much slower for basically no gain, so it was dropped in favor of installing from precompiled packages. > May I substitute it with another (modified) version of SLURM? Yes, as long as you package it and know how to edit Ansible playbooks. > Also, can I edit slurm.conf and where can I find it? Here: https://github.com/gc3-uzh-ch/elasticluster/blob/master/elasticluster/share/playbooks/roles/slurm-common/templates/slurm.conf.j2 If you do not want to mess with ElastiCluster sources, you can make your own Ansible playbook that deploys your own customized `slurm.conf` and then runs `scontrol reconfigure`. Assuming you named this playbook `after.yml`, you can run the following command to have the custom playbook run after ElastiCluster's main config:: elasticluster setup mycluster -- after.yml If you copy the `after.yml` playbook into the ElastiCluster sources, directory `elasticluster/share/playbooks/` then it will automatically be executed. Note that, since SLURM likes to embed the list of nodes and partitions in the `slurm.conf` file, then you *have to* make the new `slurm.conf` a template: ElastiCluster has to plug in the nodenames into it. A good idea could be to start with the `.j2` file provided above and modify it. > Which part of the ElastiCluster is responsible for resizing of the cluster? It's the commands `elasticluster resize` and `elasticluster remove-node`. Note that -for the time being- ElastiCluster's resize operations have to be initiated by an admin; no action is ever triggered automatically. > In SLURM's documentation I have found out about the Elastic computing and > possibility to resize the cluster through setting ResumeProgram an > SuspendProgram in slum.conf > (https://slurm.schedmd.com/elastic_computing.html). Is this how > ElastiCluster interact with SLURM, as well? No, it's quite different. SLURM requires you to specify a set of nodes in `slurm.conf`, and then you provide `ResumeProgram` and `SuspendProgram` scripts which create these nodes as VMs in a IaaS cloud. The decision of when to start or stop a node is left to the SLURM scheduler, and the process is fully automatic. The cluster will not grow beyond the limits set in `slurm.conf`. As ElastiCluster deals with many different software systems, not just SLURM, it takes a completely different approach: you can add or remove cluster nodes at any time. Every resize operation, however, triggers a re-run of the Ansible playbooks to reconfigure the cluster to the new setup. Depending on the installed software, this may lead to a downtime in operations (should not happen with SLURM, but I'm not so sure about e.g. GlusterFS). Also, resize operations must be initiated by an admin and are never triggered automatically. Does this answer your questions? Ciao, R -- Riccardo Murri, Schwerzenbacherstrasse 2, CH-8606 Nänikon, Switzerland -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
