Thanks for your response, Riccardo. I fixed that SSH keys problem but I still get errors:
*Here's the error:* $ elasticluster start gce -n elasticluster.sh: WARNING: Command 'env' does not support null-terminated lines; elasticluster.sh cannot properly sanitize the environment in this case. If you get errors later on about Docker being unable to process environment variables, you will need to install GNU coreutils' 'env'. Starting cluster `gce` with: * 1 frontend nodes. * 1 compute nodes. (This may take a while...) 2020-08-18 17:25:18 4b6111acbb85 elasticluster[1] *WARNING* UserWarning: Cannot access /Users/mahsa/.elasticluster/storage/429683943466-eclntgdphrfcbiio29sj7ekq5dceuoi2.apps.googleusercontent.com.oauth.dat: No such file or directory No handlers could be found for logger "paramiko.transport" 2020-08-18 17:36:25 4b6111acbb85 elasticluster[1] *ERROR* Some nodes of the cluster were unreachable within the given 600-seconds timeout: frontend001, compute001 Configuring the cluster ... (this too may take a while) 2020-08-18 17:36:25 4b6111acbb85 elasticluster[1] *WARNING* Ignoring node `frontend001`: No IP address. 2020-08-18 17:36:25 4b6111acbb85 elasticluster[1] *WARNING* Ignoring node `compute001`: No IP address. 2020-08-18 17:36:25 4b6111acbb85 elasticluster[1] *ERROR* The cluster hosts are up and running, but Ansible failed to set the cluster up: The cluster does not provide the minimum amount of nodes specified in the configuration. Some nodes are running, but the cluster will not be set up yet. Please change the minimum amount of nodes in the configuration or try to start a new cluster after checking the cloud provider settings. 2020-08-18 17:36:25 4b6111acbb85 elasticluster[1] *WARNING* Cluster `gce` not yet configured. Please, re-run `elasticluster setup gce` and/or check your configuration WARNING: YOUR CLUSTER `gce` IS NOT READY YET! Cluster name: gce Cluster template: gce Default ssh to node: frontend001 - frontend nodes: 1 - compute nodes: 1 *Here's my Config File (I have not included the first part of it):* [login/google] # Do not include @gmail (example: [email protected] -> monajemi) image_user=ubuntu image_user_sudo=root image_sudo=True user_key_name=elasticluster user_key_private=~/.ssh/google_compute_engine user_key_public=~/.ssh/google_compute_engine.pub [setup/ansible-slurm] provider=ansible frontend_groups=slurm_master compute_groups=slurm_worker,cuda # allow restart of compute nodes compute_var_allow_reboot=yes worker_var_allow_reboot=yes global_var_allow_reboot=yes global_var_slurm_taskplugin=task/cgroup global_var_slurm_proctracktype=proctrack/cgroup global_var_slurm_jobacctgathertype=jobacct_gather/cgroup [cluster/gce] cloud=google login=google setup=ansible-slurm security_group=default frontend_nodes=1 compute_nodes=1 ssh_to=frontend # Ask for 500G of disk boot_disk_type=pd-standard boot_disk_size=500 [cluster/gce/frontend] flavor=n1-standard-8 image_id=ubuntu-1604-xenial-v20171107b # add 2x GPUs (NVidia Tesla K80) to the compute nodes # note that as of Nov. 2017, GPU-enabled VMs are available only in few zones # use `gcloud compute accelerator-types list` to see what is available [cluster/gce/compute] flavor=n1-standard-8 #flavor=n1-highmem-8 image_id=ubuntu-1604-xenial-v20171107b #accelerator_count=1 #accelerator_type=nvidia-tesla-v100 #accelerator_type=nvidia-tesla-k80 Could you help me with this? Thanks in advance! On Friday, August 14, 2020 at 1:40:35 PM UTC-7 Riccardo Murri wrote: > Hello Mahsa, > > Regarding the error you're seeing: > > > DEBUG Ignoring error connecting to compute001: Invalid key -- <class > 'paramiko.ssh_exception.SSHException'> > > My first guess would be that you pointed ElastiCluster to some SSH key > file that it cannot read. > > Check your configuration file; you should have some lines like the > following ones: > > [login/google] > image_user=riccardo.murri > # ... > user_key_private=~/.ssh/elasticluster > user_key_public=~/.ssh/elasticluster.pub > > The lines influencing the SSH logins are the two `user_key_*` ones. > > Things to check: > > 1. *Both files must exist* on the machine where you run `elasticluster > -vvvv start ...` > 2. The file name is immaterial (could be `id_rsa` or `id_ed25519` or > `google_cloud_sdk`) but, because of a bug, ElastiCluster can only use > SSH keys of type RSA. To find out what type is the SSH key you're > using, run this command (replace `~/.ssh/elasticluster-dev` with the > path pointed to by `user_key_private` in your config file): > > $ ssh-keygen -l -f ~/.ssh/elasticluster-dev > 4096 SHA256:9r/pBW5nB2mnFrGIFxuxs8HW4ZVWDUbS/AzMeU3tjRM riccardo.murri@dev > (RSA) > > If instead of "(RSA)" you get a different code, you will need to > generate an RSA key and use that instead: > > 1. Create a new RSA key for use with elasticluster: > > ssh-keygen -t rsa -b 4096 -o -a 100 -f ~/.ssh/elasticluster > > 2. Replace the `user_key_*` lines with the following: > > user_key_private=~/.ssh/elasticluster > user_key_public=~/.ssh/elasticluster.pub > > If after making these changes you are still running into issues, > please post or send to me via email your configuration file (WARNING: > remove all private data like passwords and access keys!!) > > Hope this helps, > Riccardo > -- You received this message because you are subscribed to the Google Groups "elasticluster" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/elasticluster/1f277b6e-25cd-41ce-a3ed-797db7365714n%40googlegroups.com.
