Dear Riccardo,
Moving to different machine the problem with ssh disappeared. I have nodes
up and running, and I can ssh to fronted.
However, the cluster does not configure until the end. Here is the what
happens:
(elasticluster) ajokanovic@bscgrid28:~$ elasticluster start slurm -n
mycluster
Starting cluster `mycluster` with:
* 1 frontend nodes.
* 1 compute nodes.
(This may take a while...)
2017-01-20 21:24:53 bscgrid28 gc3.elasticluster[9092] *WARNING*
DeprecationWarning: The novaclient.v2.security_groups module is deprecated
and will be removed.
2017-01-20 21:24:53 bscgrid28 gc3.elasticluster[9092] *WARNING*
DeprecationWarning: The novaclient.v2.images module is deprecated and will
be removed after Nova 15.0.0 is released. Use python-glanceclient or
python-openstacksdk instead.
Configuring the cluster.
(this too may take a while...)
PLAY [Common setup for all hosts]
**********************************************
TASK [setup]
*******************************************************************
ok: [compute001]
ok: [frontend001]
TASK [common : Provide workaround for YAML syntax error in lines containing
colon+space] ***
ok: [frontend001]
ok: [compute001]
TASK [common : include]
********************************************************
included:
/home/ajokanovic/elasticluster/src/elasticluster/share/playbooks/roles/common/tasks/init-Debian.yml
for frontend001, compute001
TASK [common : Ensure extra repositories are present (Ubuntu)]
*****************
changed: [compute001]
changed: [frontend001]
TASK [common : Ensure the APT package cache is updated]
************************
changed: [frontend001]
changed: [compute001]
TASK [common : Install Ansible `apt` module dependencies]
**********************
changed: [compute001]
* [WARNING]: Consider using apt module rather than running apt-get*
changed: [frontend001]
TASK [common : Upgrade all installed packages to latest version]
***************
ok: [compute001]
ok: [frontend001]
TASK [common : Ensure additional packages are installed]
***********************
failed: [compute001] (item=[u'apt-transport-https', u'sysvinit-utils',
u'software-properties-common', u'python-software-properties',
u'python-pycurl']) => {"cache_update_time": 1450125930, "cache_updated":
false, "failed": true, "item": ["apt-transport-https", "sysvinit-utils",
"software-properties-common", "python-software-properties",
"python-pycurl"], "msg": "'/usr/bin/apt-get -y -o
\"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"
install 'python-software-properties'' failed: E: There are problems and
-y was used without --force-yes\n", "stderr": "E: There are problems and -y
was used without --force-yes\n", "stdout": "Reading package
lists...\nBuilding dependency tree...\nReading state information...\nThe
following NEW packages will be installed:\n python-software-properties\n0
upgraded, 1 newly installed, 0 to remove and 0 not upgraded.\nNeed to get
19.6 kB of archives.\nAfter this operation, 138 kB of additional disk space
will be used.\nWARNING: The following packages cannot be authenticated!\n
python-software-properties\n", "stdout_lines": ["Reading package lists...",
"Building dependency tree...", "Reading state information...", "The
following NEW packages will be installed:", " python-software-properties",
"0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.", "Need to
get 19.6 kB of archives.", "After this operation, 138 kB of additional disk
space will be used.", "WARNING: The following packages cannot be
authenticated!", " python-software-properties"]}
failed: [frontend001] (item=[u'apt-transport-https', u'sysvinit-utils',
u'software-properties-common', u'python-software-properties',
u'python-pycurl']) => {"cache_update_time": 1450125930, "cache_updated":
false, "failed": true, "item": ["apt-transport-https", "sysvinit-utils",
"software-properties-common", "python-software-properties",
"python-pycurl"], "msg": "'/usr/bin/apt-get -y -o
\"Dpkg::Options::=--force-confdef\" -o \"Dpkg::Options::=--force-confold\"
install 'python-software-properties'' failed: E: There are problems and
-y was used without --force-yes\n", "stderr": "E: There are problems and -y
was used without --force-yes\n", "stdout": "Reading package
lists...\nBuilding dependency tree...\nReading state information...\nThe
following NEW packages will be installed:\n python-software-properties\n0
upgraded, 1 newly installed, 0 to remove and 0 not upgraded.\nNeed to get
19.6 kB of archives.\nAfter this operation, 138 kB of additional disk space
will be used.\nWARNING: The following packages cannot be authenticated!\n
python-software-properties\n", "stdout_lines": ["Reading package lists...",
"Building dependency tree...", "Reading state information...", "The
following NEW packages will be installed:", " python-software-properties",
"0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.", "Need to
get 19.6 kB of archives.", "After this operation, 138 kB of additional disk
space will be used.", "WARNING: The following packages cannot be
authenticated!", " python-software-properties"]}
to retry, use: --limit
@/home/ajokanovic/elasticluster/src/elasticluster/share/playbooks/site.retry
PLAY RECAP
*********************************************************************
compute001 : ok=7 changed=3 unreachable=0
failed=1
frontend001 : ok=7 changed=3 unreachable=0
failed=1
2017-01-20 21:50:37 bscgrid28 gc3.elasticluster[9092] *ERROR* Command
`ansible-playbook
/home/ajokanovic/elasticluster/src/elasticluster/share/playbooks/site.yml
--inventory=/home/ajokanovic/.elasticluster/storage/mycluster.inventory
--become --become-user=root` failed with exit code 2.
2017-01-20 21:50:37 bscgrid28 gc3.elasticluster[9092] *ERROR* Check the
output lines above for additional information on this error.
2017-01-20 21:50:37 bscgrid28 gc3.elasticluster[9092] *ERROR* The cluster
has likely *not* been configured correctly. You may need to re-run
`elasticluster setup` or fix the playbooks.
2017-01-20 21:50:37 bscgrid28 gc3.elasticluster[9092] *WARNING* Cluster
`mycluster` not yet configured. Please, re-run `elasticluster setup
mycluster` and/or check your configuration
WARNING: YOUR CLUSTER IS NOT READY YET!
Cluster name: mycluster
Cluster template: slurm
Default ssh to node: frontend001
- frontend nodes: 1
- compute nodes: 1
To login on the frontend node, run the command:
elasticluster ssh mycluster
To upload or download files to the cluster, use the command:
elasticluster sftp mycluster
Best regards,
Ana
On Thursday, 19 January 2017 14:02:34 UTC+1, Riccardo Murri wrote:
>
> Dear Ana:
>
> > One thing I still do not have clear is how adding/removing nodes from
> > the cluster affects SLURM (the queue, the state information, etc.) and
> > jobs already running. Is it equivalent to "scontrol reconfigure" where
> > you have to restart slurmctld every time you add or remove a node?
>
> Yes, ElastiCluster writes the new config file and then restarts the
> SLURM daemons.
>
> > What if we want to do these changes of nodes frequently, how does this
> > affect the users?
>
> They should not notice, except for the occasional glitch while
> `slurmctld` is restarted. I'll admit that this has not received much
> testing, though (OTOH, neither have I received any bug reports on this).
>
> Note that, when scaling down a cluster, you should (1) set nodes you
> want to remove in "DRAIN" state, (2) use `elasticluster remove-node` to
> remove them. (`elasticluster resize -r` removes the nodes immediately
> starting with the highest-numbered ones, regardless of whether they are
> running any jobs.)
>
> Ciao,
> R
>
> --
> Riccardo Murri, Schwerzenbacherstrasse 2, CH-8606 Nänikon, Switzerland
>
--
You received this message because you are subscribed to the Google Groups
"elasticluster" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.