Branch: refs/heads/master Home: https://github.com/gc3-uzh-ch/elasticluster Commit: 5fc5a991f1634337819d8f67edc858ab08c4eec4 https://github.com/gc3-uzh-ch/elasticluster/commit/5fc5a991f1634337819d8f67edc858ab08c4eec4 Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018)
Changed paths: M docs/configure.rst M elasticluster/conf.py M elasticluster/providers/gce.py M elasticluster/validate.py A examples/slurm-with-gpu-on-google.conf M tests/test_conf.py Log Message: ----------- Add support for GPUs on Google Cloud Many thanks to @benpass for providing the initial implementation in PR #406! Commit: 32567cb120697af45c3ac8900dd765bf2f830580 https://github.com/gc3-uzh-ch/elasticluster/commit/32567cb120697af45c3ac8900dd765bf2f830580 Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: M elasticluster/share/playbooks/roles/slurm-common/templates/slurm.conf.j2 Log Message: ----------- slurm-common: Cosmetic changes to `slurm.conf` Commit: b27240f7ae0152f06ce0276ca6df1d10eb267f6b https://github.com/gc3-uzh-ch/elasticluster/commit/b27240f7ae0152f06ce0276ca6df1d10eb267f6b Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: M docs/playbooks.rst A elasticluster/share/playbooks/roles/slurm-worker/files/etc/slurm/cgroup/release_agent A elasticluster/share/playbooks/roles/slurm-worker/files/etc/slurm/cgroup_allowed_devices_file.conf A elasticluster/share/playbooks/roles/slurm-worker/files/usr/local/sbin/elasticluster-check-kconfig-cgroups.sh A elasticluster/share/playbooks/roles/slurm-worker/tasks/cgroup.yml M elasticluster/share/playbooks/roles/slurm-worker/tasks/main.yml A elasticluster/share/playbooks/roles/slurm-worker/templates/cgroup.conf.j2 A elasticluster/share/playbooks/roles/slurm-worker/vars/main.yml Log Message: ----------- SLURM: Support use of cgroups (opt-in). Configure cgroup support in SLURM if any one of the cgroup-based plugins (`task/cgroup`, `jobacct_gather/cgroup`, or `proctrack/cgroup`) is configured in `slurm.conf`. *Note:* SLURM's cgroup support requires that swap accounting be enabled in the kernel. This is *not* the default on Debian and Ubuntu, and a reboot is needed to enable it. ElastiCluster will by default try to configure the bootloader but *not* reboot the nodes -- use `global_var_allow_reboot=yes` to change this default and reboot the nodes if needed. Commit: 50a997209ba1a3a9c2c77b95ca5da8226cc6b404 https://github.com/gc3-uzh-ch/elasticluster/commit/50a997209ba1a3a9c2c77b95ca5da8226cc6b404 Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: A elasticluster/share/playbooks/library/gpus M elasticluster/share/playbooks/roles/slurm-common/tasks/main.yml M elasticluster/share/playbooks/roles/slurm-common/templates/slurm.conf.j2 M elasticluster/share/playbooks/roles/slurm-worker/tasks/main.yml A elasticluster/share/playbooks/roles/slurm-worker/templates/gres.conf.j2 M elasticluster/share/playbooks/site.yml Log Message: ----------- Configure SLURM's GRES with GPUs (if available) Commit: 0c6f0490830f3a404e8f1561bbceab7d76f9f7d7 https://github.com/gc3-uzh-ch/elasticluster/commit/0c6f0490830f3a404e8f1561bbceab7d76f9f7d7 Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: M elasticluster/cluster.py Log Message: ----------- Better message for "instance running" check. Let's say the instance is "up" instead of "up and running", because the latter suggests that we can connect and use it any time, and that is typically not true for instances that are just starting (booting times can still be a few minutes). Commit: 2b8dd3e9c0f126c2d8c97df4984099c1c7abc5fe https://github.com/gc3-uzh-ch/elasticluster/commit/2b8dd3e9c0f126c2d8c97df4984099c1c7abc5fe Author: Hatef Monajemi <monaj...@stanford.edu> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: A elasticluster/share/playbooks/roles/cuda/tasks/init-Debian.yml A elasticluster/share/playbooks/roles/cuda/tasks/init-RedHat.yml A elasticluster/share/playbooks/roles/cuda/tasks/main.yml M elasticluster/share/playbooks/site.yml Log Message: ----------- New role `cuda` to automatically install CUDA if GPUs are detected. Commit: 486a9869222826f1e7fa37a6838ee2823287ff66 https://github.com/gc3-uzh-ch/elasticluster/commit/486a9869222826f1e7fa37a6838ee2823287ff66 Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: A elasticluster/share/playbooks/library/bootparam.py Log Message: ----------- New Ansible module `bootparam.py` to alter Linux boot command-line. Commit: 4a1ebf210f2da4ec8ccd88ddedce9d114c37e461 https://github.com/gc3-uzh-ch/elasticluster/commit/4a1ebf210f2da4ec8ccd88ddedce9d114c37e461 Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: M docs/playbooks.rst M elasticluster/share/playbooks/roles/slurm-common/defaults/main.yml M elasticluster/share/playbooks/roles/slurm-common/templates/slurm.conf.j2 Log Message: ----------- SLURM: Allow configuring more parameters in `slurm.conf` through setup variables. Specifically, it is now possible to set variables in the `[setup/*]` section to assign values to the following SLURM configuration parameters: * `FastSchedule` (default 1) * `JobAcctGatherFrequency` (default 60) * `JobAcctGatherType` (default `jobacct_gather/linux`) * `MaxArraySize` (default 1000) * `MaxJobCount` (default 10000) * `ProcTrackType` (default `proctrack/linuxproc`) * `ReturnToService` (default 1) * `SelectType` (default `select/cons_res`) * `SelectTypeParameters` (default `CR_Core_Memory`) * `TaskPlugin` (default `task/none`) The ElastiCluster setup variable name corresponding to a SLURM parameter name is the lowercased name prefixed with `slurm_`. For instance, SLURM parameter `FastSchedule` can be configured via the variable `slurm_fastschedule`. (Note that SLURM parameter names are not case-sensitive, but ElastiCluster variable names are.) Default values have not changed from previous ElastiCluster releases. Commit: 2aa414e7ac9038c71e090edfce4ea1d538420cd9 https://github.com/gc3-uzh-ch/elasticluster/commit/2aa414e7ac9038c71e090edfce4ea1d538420cd9 Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: M elasticluster/share/playbooks/roles/bigtop/tasks/main.yml Log Message: ----------- Bigtop: only update APT cache if repo was added in this run Commit: 05474b82c2b718390077301acbe290e1860705f7 https://github.com/gc3-uzh-ch/elasticluster/commit/05474b82c2b718390077301acbe290e1860705f7 Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: M docs/playbooks.rst M elasticluster/share/playbooks/library/gpus A elasticluster/share/playbooks/roles/cuda.yml A elasticluster/share/playbooks/roles/cuda/defaults/main.yml A elasticluster/share/playbooks/roles/cuda/tasks/_check_nvidia_dev.yml A elasticluster/share/playbooks/roles/cuda/tasks/_reboot_and_wait.yml M elasticluster/share/playbooks/roles/cuda/tasks/init-Debian.yml M elasticluster/share/playbooks/roles/cuda/tasks/init-RedHat.yml M elasticluster/share/playbooks/roles/cuda/tasks/main.yml A elasticluster/share/playbooks/roles/cuda/templates/etc/profile.d/cuda.csh.j2 A elasticluster/share/playbooks/roles/cuda/templates/etc/profile.d/cuda.sh.j2 A elasticluster/share/playbooks/roles/cuda/templates/etc/yum.repos.d/cuda.repo.j2 A elasticluster/share/playbooks/roles/cuda/vars/main.yml M elasticluster/share/playbooks/roles/slurm-worker/tasks/cgroup.yml M elasticluster/share/playbooks/site.yml Log Message: ----------- CUDA: Many role improvements. In particular: * support role on CentOS/RHEL as well * allow setting CUDA version via setup variable * ensure CUDA binaries are found in the PATH of login shells * document the new role Commit: add3000ecee7aa8ac6654d1143279c52c78dfea6 https://github.com/gc3-uzh-ch/elasticluster/commit/add3000ecee7aa8ac6654d1143279c52c78dfea6 Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: M elasticluster/share/playbooks/roles/slurm-common/templates/slurm.conf.j2 Log Message: ----------- SLURM: allow setting `DefMemPerCPU` through setup variables. Commit: 3df1a756e5a7ed4fc97868f64faa4e4f45f614f4 https://github.com/gc3-uzh-ch/elasticluster/commit/3df1a756e5a7ed4fc97868f64faa4e4f45f614f4 Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: M elasticluster/share/playbooks/roles/slurm-worker/tasks/main.yml Log Message: ----------- SLURM: Fix YAML syntax error in task "Install SLURM worker packages" Commit: 87de4cd6bfbc04aef75219cdba47e4992c06166a https://github.com/gc3-uzh-ch/elasticluster/commit/87de4cd6bfbc04aef75219cdba47e4992c06166a Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: M elasticluster/share/playbooks/roles/common/tasks/init-RedHat.yml Log Message: ----------- CentOS/RHEL: Upgrade all installed packages to latest version This is necessary in order to get correct kernel and headers, in case we need to compile additional device drivers (e.g., for CUDA). Commit: a9b2e1e37b5a59ec68de97aba5a0e5c7a9331244 https://github.com/gc3-uzh-ch/elasticluster/commit/a9b2e1e37b5a59ec68de97aba5a0e5c7a9331244 Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: A examples/slurm-with-gpu-on-aws.conf M examples/slurm-with-gpu-on-google.conf Log Message: ----------- Update GPU-accelerated cluster examples. Commit: 9c3b3dfee4c328c9837475bdee2647e430299287 https://github.com/gc3-uzh-ch/elasticluster/commit/9c3b3dfee4c328c9837475bdee2647e430299287 Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: M docs/playbooks.rst Log Message: ----------- Convert SLURM variables table to list-table format. Makes it waaaay easier to edit descriptions... Commit: 66a6bad78e8bd3ec02301ca9a3f58d7266f6fb59 https://github.com/gc3-uzh-ch/elasticluster/commit/66a6bad78e8bd3ec02301ca9a3f58d7266f6fb59 Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: M docs/playbooks.rst M elasticluster/share/playbooks/roles/slurm-common/templates/slurm.conf.j2 M elasticluster/share/playbooks/roles/slurm-worker/templates/cgroup.conf.j2 Log Message: ----------- SLURM: Use ``slurm_allowedramspace`` and ``slurm_allowedswapspace`` to compute the total ``VSizeFactor``. Commit: cf1d1f04f6ec3810686ad1463ed31ef9a56be8e1 https://github.com/gc3-uzh-ch/elasticluster/commit/cf1d1f04f6ec3810686ad1463ed31ef9a56be8e1 Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: M docs/playbooks.rst M elasticluster/share/playbooks/roles/slurm-common/defaults/main.yml M elasticluster/share/playbooks/roles/slurm-common/templates/slurm.conf.j2 Log Message: ----------- SLURM: Default for `ReturnToService` is now `2`. If CUDA or the `task/cgroup` plugins are used, it is possible that a reboot happens during the installation. When the nodes come back up, SLURM will mark them as "down" since the reboot was unexpected, and wait for a sysadmin to issue `sudo scontrol update nodename=... state=resume`. With `ReturnToService=2`, nodes where `slurmd` is running will be automatically returned to "idle" state, which is what most people (likely) want. Commit: 692da1b032e5cea0aa98be14f344a94574e71605 https://github.com/gc3-uzh-ch/elasticluster/commit/692da1b032e5cea0aa98be14f344a94574e71605 Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: M elasticluster/share/playbooks/roles/cuda/tasks/main.yml Log Message: ----------- Temporary workaround for incompatibility between newest Ubuntu kernel and `nvidia-387` driver. Commit: 11c74ff90c0c6e566721269f61d0f7f7731d3e3c https://github.com/gc3-uzh-ch/elasticluster/commit/11c74ff90c0c6e566721269f61d0f7f7731d3e3c Author: Riccardo Murri <riccardo.mu...@gmail.com> Date: 2018-01-18 (Thu, 18 Jan 2018) Changed paths: M elasticluster/share/playbooks/roles/cuda/tasks/_reboot_and_wait.yml Log Message: ----------- CUDA: Fix installation on Ubuntu 14.04 Installation on Ubuntu 14.04 *does* indeed require a reboot, plus long times compiling the driver for two different kernel versions. Compare: https://github.com/gc3-uzh-ch/elasticluster/compare/3b14ca56a167...11c74ff90c0c -- You received this message because you are subscribed to the Google Groups "elasticluster-dev" group. To unsubscribe from this group and stop receiving emails from it, send an email to elasticluster-dev+unsubscr...@googlegroups.com. To post to this group, send email to elasticluster-dev@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticluster-dev/5a6116c8965de_53f92abd3deb1c147583c%40hookshot-fe-cace476.cp1-iad.github.net.mail. For more options, visit https://groups.google.com/d/optout.