Hi all, We've been using Young Oh's OpenStack module and we've run into an interesting bug/behaviour. Below I use virtual machine to refer to the entry for the computer in VCL and instance for the actual virtual machine running in OpenStack.
When the module terminates an OpenStack instance it does by looking up the instance ID by searching for its private IP address. The IP address it uses is determined using the get_computer_private_ip_address function in VCL which first looks in the data structure, then the hosts file and finally the database. If I understand correctly since the IP address is part of the reservation part of the data structure it's only going to be available when there's an active reservation (could someone confirm?). Up to this point we haven't been populating the hosts file ourselves because the OpenStack module takes care of that itself, but what this means is that the IP address won't be present in the hosts file until that virtual machine has been reserved for the first time. Finally we've just been putting bogus values in the database for the IP address since it will change every time a new instance is created. The problem is the database is (obviously) inaccurate and the hosts file can potentially become inaccurate which I believe causes the following problematic situations: 1. A virtual machine is reserved for the first time which causes the OpenStack module to use the IP address in the database since it's not present anywhere else. This IP is almost guaranteed to be incorrect and will cause any instance that may happen to be using it to be terminated. This is what caused us problems but what we should have done instead was leave the fields blank in the database. 2. Since the OpenStack module only updates the host file when load is called it's possible for the instance to be terminated which releases its IP for use by a new instance. If that happens the new instance would be terminated if a new reservation was made for the virtual machine corresponding to the old instance. From what I can tell this shouldn't occur during normal operation, but could still conceivably happen. I'm pretty sure I've got these details right, but please correct any gaps in my understanding. Being affected by the first problem was a mistake on our part but this is still something that could probably be handled in a more consistent way. The best solution would probably be to modify the OpenStack module to use the instance's UUID rather than private IP address as the primary identification method which would remove any potential possibility of collisions. Does this sound right to everyone? Cameron Mann
