I believe I figured out the problem. I had set my control_path to this value in ansible.cfg:
[ssh_connection] > control_path = %(directory)s/%%r Which didn't include %h (ie hostname), so connections overwrote each others' control sockets (by default, ansible seems to re-use SSH connections as decribed here <https://en.wikibooks.org/wiki/OpenSSH/Cookbook/Multiplexing>). I tried both adding %h to the control path, and disabling multiplexing (ControlMaster=no), and both succeeded. So, kids, include at least %h, %p and %r in the control path, as the man page <http://www.openbsd.org/cgi-bin/man.cgi/OpenBSD-current/man5/ssh_config.5> suggests, so that connections don't get mixed up. On Monday, January 25, 2016 at 10:00:49 AM UTC+1, [email protected] wrote: > > The nodes have the same ssh key, and the user is correct. In fact, when I > try the playbook with --limit, and apply it on one node at a time, it works > correctly. It's only when I group nodes using a custom EC2 tag, that this > happens. > > Also, this has happened with several playbooks, and always during the > "setup" task, before any of my own tasks. To demonstrate this, I wrote this > ultra-simple playbook: > > --- >> - hosts: tag_Purpose_NodePurpose >> gather_facts: True >> become: yes >> tasks: >> - debug: msg="This is just a demonstration" > > > It still happens. I got the following output (some stuff redacted): > > http://pastebin.com/m2Lu8yQC > > I also ran the same playbook ONLY on the node that failed above, and it > worked correctly. > > On Friday, January 22, 2016 at 6:10:57 PM UTC+1, > [email protected] wrote: >> >> Well.... >> do you nodes have the same ssh key? >> is ubuntu your remote_user ? >> is ~ubuntu/.ansible writable/readable? >> >> Better yet, can you show us your playbook and the task it is failing on? >> Best, >> -- >> E >> >> On Friday, January 22, 2016 at 7:33:29 AM UTC-8, >> [email protected] wrote: >>> >>> Hello, >>> >>> I'm using the Amazon Dynamic inventory script from here >>> <http://docs.ansible.com/ansible/intro_dynamic_inventory.html#example-aws-ec2-external-inventory-script>. >>> >>> I've created some instances on EC2, and have given all of them a custom tag >>> called Purpose, with a value like NodePurpose, and instance names Node1, >>> Node2 etc. >>> >>> I've written a playbook with: >>> >>> --- >>> hosts: tag_Purpose_NodePurpose >>> >>> so that the playbook points to all the abovementioned nodes. When more >>> than one node is up, I get random SSH errors from some or all of the nodes, >>> with this message: >>> >>> ERROR! failed to transfer file to >>>> /home/ubuntu/.ansible/tmp/ansible-tmp-1453475788.15-95937026898429/setup: >>>> >>> >>> >>> sftp> put /tmp/tmpIwd28M >>>> /home/ubuntu/.ansible/tmp/ansible-tmp-1453475788.15-95937026898429/setup >>>> OpenSSH_6.7p1 Debian-5+deb8u1, OpenSSL 1.0.1k 8 Jan 2015 >>>> debug1: Reading configuration data /etc/ssh/ssh_config >>>> debug1: /etc/ssh/ssh_config line 19: Applying options for * >>>> debug1: auto-mux: Trying existing master >>>> debug2: fd 3 setting O_NONBLOCK >>>> debug2: mux_client_hello_exchange: master version 4 >>>> debug3: mux_client_forwards: request forwardings: 0 local, 0 remote >>>> debug3: mux_client_request_session: entering >>>> debug3: mux_client_request_alive: entering >>>> debug3: mux_client_request_alive: done pid = 10523 >>>> debug3: mux_client_request_session: session request sent >>>> debug1: mux_client_request_session: master session id: 4 >>>> debug2: Remote version: 3 >>>> debug2: Server supports extension "[email protected]" revision 1 >>>> debug2: Server supports extension "[email protected]" revision 2 >>>> debug2: Server supports extension "[email protected]" revision 2 >>>> debug2: Server supports extension "[email protected]" revision 1 >>>> debug2: Server supports extension "[email protected]" revision 1 >>>> debug3: Sent message fd 5 T:16 I:1 >>>> debug3: SSH_FXP_REALPATH . -> /home/ubuntu size 0 >>>> debug3: Looking up /tmp/tmpIwd28M >>>> debug3: Sent message fd 5 T:17 I:2 >>>> debug3: Received stat reply T:101 I:2 >>>> debug1: Couldn't stat remote file: No such file or directory >>>> debug3: Sent message SSH2_FXP_OPEN I:3 >>>> P:/home/ubuntu/.ansible/tmp/ansible-tmp-1453475788.15-95937026898429/setup >>>> remote >>>> open("/home/ubuntu/.ansible/tmp/ansible-tmp-1453475788.15-95937026898429/setup"): >>>> >>>> No such file or directory >>>> debug3: mux_client_read_packet: read header failed: Broken pipe >>>> debug2: Received exit status from master 0 >>> >>> >>> It seems like ansible is looking for a tmp file that only exists on one >>> instance, on other instances, too, and of course, it's not there. This has >>> happened with all versions since 1.7, and with various versions of the >>> ec2.py script (I've updated it from time to time). >>> >>> Anyone have any idea what this is about and how to solve it? >>> >>> Alexander >>> >>> -- You received this message because you are subscribed to the Google Groups "Ansible Project" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/d3712110-6119-479e-8ffe-cf6b925d16ad%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
