ok... i think ihave a reasonable description of what caused this specific hang.
The last entry in this /var/log/waagent.log was:
2013/06/28 14:50:12 Provisioning image using OVF settings in the DVD.
2013/06/28 14:50:12 Resource disk (/dev/sdb1) is mounted at /mnt/resource with 
fstype ext4
2013/06/28 14:50:14 Created user account: test
2013/06/28 14:50:15 EnvMonitor: Detected host name change: ubuntu -> 
gwaclhost24lm4crqqc7kqfl9hya2ieuvxz9fvdqpn6ipfgdi4gh1p7snuwjezcw
2013/06/28 14:50:15 Setting host name: 
gwaclhost24lm4crqqc7kqfl9hya2ieuvxz9fvdqpn6ipfgdi4gh1p7snuwjezcw

In a *good* run, the entry after Created user account should be somethin
glike:

2013/06/28 16:38:40 Provisioning image using OVF settings in the DVD.
2013/06/28 16:38:40 Disabled SSH password-based authentication methods.
2013/06/28 16:38:41 Created user account: smoser
2013/06/28 16:38:42 EnvMonitor: Detected host name change: ubuntu -> smoser0628p
2013/06/28 16:38:42 Setting host name: smoser0628p
2013/06/28 16:38:55 ERROR:CalledProcessError.  Error Code: 1
2013/06/28 16:38:55 ERROR:CalledProcessError.  Command string: "service ssh 
status | grep running"
2013/06/28 16:38:55 ERROR:CalledProcessError.  Command result: ""
2013/06/28 16:38:55 Posted Role Properties. 
CertificateThumbprint=075533b9075b4130651b1f74c451ca9b
2013/06/28 16:38:55 Root password deleted.


Note, the ERROR information is coming from a differen thread in the process. 
you can just ignore it.
The key is  that it should be Posting Role Properties, but this instnace didn't 
get there.

an 'strace' of the pid, shows:
$ sudo strace -p 498
Process 498 attached - interrupt to quit
select(0, NULL, NULL, NULL, {0, 321648}) = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0})     = 0 (Timeout)
select(0, NULL, NULL, NULL, {1, 0}

the other key bit of info here is that 'Setting host name' comes from
'UpdateAndPublishHostName', which , amoung other questionable
operations, does:

    for ethernetInterface in PossibleEthernetInterfaces:
        Run("ifdown " + ethernetInterface + " && ifup " + 
ethernetInterface,chk_err=False) # We supress error logging on error.
    self.RestoreRoutes()


So what I think happened here, is that the attempt to ReportRoleProperties 
("Posting Role Properties") ends up blocking/hanging in 
httplib.HTTPConnection(self.Endpoint), where no 'timeout' is specified.

One good thing is that this can be disabled via:
 Provisioning.MonitorHostName=n


** Changed in: walinuxagent (Ubuntu)
   Importance: Undecided => High

** Changed in: walinuxagent (Ubuntu)
       Status: New => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Server Team, which is subscribed to walinuxagent in Ubuntu.
https://bugs.launchpad.net/bugs/1195524

Title:
  race condition / transient failure to provision

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/walinuxagent/+bug/1195524/+subscriptions

-- 
Ubuntu-server-bugs mailing list
Ubuntu-server-bugs@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/ubuntu-server-bugs

Reply via email to