ok... i think ihave a reasonable description of what caused this specific hang. The last entry in this /var/log/waagent.log was: 2013/06/28 14:50:12 Provisioning image using OVF settings in the DVD. 2013/06/28 14:50:12 Resource disk (/dev/sdb1) is mounted at /mnt/resource with fstype ext4 2013/06/28 14:50:14 Created user account: test 2013/06/28 14:50:15 EnvMonitor: Detected host name change: ubuntu -> gwaclhost24lm4crqqc7kqfl9hya2ieuvxz9fvdqpn6ipfgdi4gh1p7snuwjezcw 2013/06/28 14:50:15 Setting host name: gwaclhost24lm4crqqc7kqfl9hya2ieuvxz9fvdqpn6ipfgdi4gh1p7snuwjezcw
In a *good* run, the entry after Created user account should be somethin glike: 2013/06/28 16:38:40 Provisioning image using OVF settings in the DVD. 2013/06/28 16:38:40 Disabled SSH password-based authentication methods. 2013/06/28 16:38:41 Created user account: smoser 2013/06/28 16:38:42 EnvMonitor: Detected host name change: ubuntu -> smoser0628p 2013/06/28 16:38:42 Setting host name: smoser0628p 2013/06/28 16:38:55 ERROR:CalledProcessError. Error Code: 1 2013/06/28 16:38:55 ERROR:CalledProcessError. Command string: "service ssh status | grep running" 2013/06/28 16:38:55 ERROR:CalledProcessError. Command result: "" 2013/06/28 16:38:55 Posted Role Properties. CertificateThumbprint=075533b9075b4130651b1f74c451ca9b 2013/06/28 16:38:55 Root password deleted. Note, the ERROR information is coming from a differen thread in the process. you can just ignore it. The key is that it should be Posting Role Properties, but this instnace didn't get there. an 'strace' of the pid, shows: $ sudo strace -p 498 Process 498 attached - interrupt to quit select(0, NULL, NULL, NULL, {0, 321648}) = 0 (Timeout) select(0, NULL, NULL, NULL, {1, 0}) = 0 (Timeout) select(0, NULL, NULL, NULL, {1, 0} the other key bit of info here is that 'Setting host name' comes from 'UpdateAndPublishHostName', which , amoung other questionable operations, does: for ethernetInterface in PossibleEthernetInterfaces: Run("ifdown " + ethernetInterface + " && ifup " + ethernetInterface,chk_err=False) # We supress error logging on error. self.RestoreRoutes() So what I think happened here, is that the attempt to ReportRoleProperties ("Posting Role Properties") ends up blocking/hanging in httplib.HTTPConnection(self.Endpoint), where no 'timeout' is specified. One good thing is that this can be disabled via: Provisioning.MonitorHostName=n ** Changed in: walinuxagent (Ubuntu) Importance: Undecided => High ** Changed in: walinuxagent (Ubuntu) Status: New => Confirmed -- You received this bug notification because you are a member of Ubuntu Server Team, which is subscribed to walinuxagent in Ubuntu. https://bugs.launchpad.net/bugs/1195524 Title: race condition / transient failure to provision To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions -- Ubuntu-server-bugs mailing list Ubuntu-server-bugs@lists.ubuntu.com Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs