That's great observation. I think option 1 and 2 can provide fast loading time due to no load failure. But, to avoid quota issues, an admin should periodically check the all the instances whether there are any duplicate instance names and defunct instances. The option 3 is safe but slow to load if any deleting instance fails. I also agree that option 2 would be a good choice because it can provide lower load time for end-users and also we cannot exactly estimate the deletion time. But I hope to hear others' thoughts.
Best regards Young On Tue, Jul 22, 2014 at 2:46 PM, Cameron Mann <[email protected]> wrote: > Looks good. Though I do wonder if it's necessary to fail the entire load > process just because the old instance doesn't get deleted. I think there's > three possibilities: > > 1. Don't check for successful deletion; there won't be any conflicts > because we're using openstackComputerMap. This would give the fastest load > times, but the only way to find out that something went wrong would be to > look at the list of instances and see if there are any duplicate names. > Could cause issues with quotas if running near capacity since there could > be extra instances lying around. > > 2. Check for successful deletion, but only log the error, don't fail the > load. Slower load times, but the load won't fail and the error will be > logged. Could cause issues with quotas if running near capacity since there > could be extra instances lying around. > > 3. What the module does now, check for successful deletion and fail if not. > Least end user friendly since they might encounter failures, but the safest > option. Won't cause quota issues on it's own, though an admin could still > change the computer back to available without deleting the defunct > instance. > > Instance deletion time is also not very consistent in my experience; I've > seen anything from seconds to over a minute and I imagine it could go > higher on OpenStack systems that see heavier usage. If we stick with option > 3 I'd recommend bumping the timeout by another minute or two just to be > safe. I think it's less necessary for option 2 since it doesn't fail on > timeout. > > I took a look at some of the other provisioning modules to see what they > do: > > - VMware logs a warning if it fails to delete the old VM, but only fails if > the VM is still responding to SSH > - Libvirt fails if deletion fails > - VirtualBox doesn't check for successful deletion, though it will fail if > it can't find the old VM to delete > > I think options 2 or 3 would be most consistent with existing behaviour. > I'd lean towards option 2 since end users won't see any extra failures and > we can keep a lower timeout which will mean lower load times even if a > deletion takes a long time. > > What are you thoughts? > > Cameron > > > On Tue, Jul 22, 2014 at 9:05 AM, YOUNG OH <[email protected]> wrote: > > > Cameron, > > > > Yes, you are definitely right. I was noticed that using hostname to find > > the openstack instance id is not working properly and also can cause the > > problem you described. I've back to use the openstackComputerMap table to > > get_os_instance_id when the instance is pingable and also add a loop in > > _terminate_os_instance to check whether the instance is completely > deleted > > or not. Please take a look at it again and let me know if you have any > > concerns. Thank you. > > > > Best regards, > > Young > > > > > > On Mon, Jul 21, 2014 at 4:18 PM, Cameron Mann <[email protected]> > > wrote: > > > > > Sounds like good progress to me. One comment though, it looks like > > > _terminate_os_instance does the DELETE request, checks the response for > > > success and then sleeps for 30 seconds while the instance deletes. > > However, > > > I don't believe a successful response to the DELETE request guarantees > > the > > > instance will actually be deleted. I've run into situations where an > > > instance gets stuck in the error or deleting states but the command > line > > > client reports no errors when trying to delete it. This could result > in a > > > situation where multiple instances with the same name exist which could > > > cause _get_os_instance_id to return the wrong ID since it filters the > > > instances based on name and selects the first in the list. > > > > > > I think either returning to using openstackComputerMap or looping with > a > > > timeout until the instance is actually deleted would be better choices. > > The > > > former would allow the new instance to be created even if the deletion > of > > > the old one fails. The latter would put the computer in VCL into an > error > > > state which would make it more obvious something has gone wrong, though > > at > > > the cost of potentially failing a user's reservation. As an added > > > precaution It might also be worth having _get_os_instance_id fail if > > > there's more than one instance in the response. > > > > > > Cameron > > > > > > > > > On Fri, Jul 18, 2014 at 9:25 AM, YOUNG OH <[email protected]> > > wrote: > > > > > > > Cameron, > > > > > > > > I hope you had a great time and welcome back to work :-). And, yes, > the > > > > OpenStack module with directly using OpenStack APIs can solve the > > > concerns > > > > we've discussed and it's more flexible to apply new version of > > OpenStack > > > > APIs, if necessary. In the updated openstack module, I've changed the > > two > > > > main things. First, I've used the hostname in Computer table (unique > in > > > the > > > > same VCL database) to create an instance and get the instance id to > > > > terminate rather than using the openstackComputerMap table. This can > > > avoid > > > > using an additional table and database transactions. Second, I've > > changed > > > > the openstackImageMap to openstackimagerevision table that maps the > > > > imagerevision id with the openstack image id. This table consists of > > > three > > > > fields (imagerevisionid, imagedetails, flavordetails). The > imagedetails > > > and > > > > flavordetails contains the details image and flavor information with > > json > > > > format. Thus, when VCL creates an instance, it gets each detail > > > information > > > > and parse them to find the corresponding openstack image id and > flavor > > > id. > > > > In addition, I've implemented the get_image_size() subroutine because > > the > > > > image size information was not supported in OpenStack ESSEX but it > > > supports > > > > now. This is a short summary about the changes. So, if you have any > > > concern > > > > or questions about the updates, please let me know. Thank you. > > > > > > > > Best regards, > > > > Young-Hyun > > > > > > > > > > > > On Thu, Jul 17, 2014 at 11:53 AM, Cameron Mann < > [email protected] > > > > > > > wrote: > > > > > > > > > Sorry for the silence from my end, I realized I forgot to mention I > > was > > > > > going to be on vacation for the last week and a half. Anyways, it > > looks > > > > > like Young's updates have addressed the main concerns we were > having > > > with > > > > > regards to the command line client. Given the progress he's made we > > > think > > > > > going ahead with his module makes the most sense. > > > > > > > > > > Cameron > > > > > > > > > > > > > > > On Wed, Jul 16, 2014 at 9:27 AM, YOUNG OH <[email protected] > > > > > > wrote: > > > > > > > > > > > Andy, > > > > > > > > > > > > Thank you for your comments. I've tried to apply what you > addressed > > > and > > > > > > committed my module again. This module finds all openstack > > > information > > > > > > using OpenStack APIs and database. Thank you. > > > > > > > > > > > > Best regards, > > > > > > Young-Hyun > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jul 9, 2014 at 10:24 AM, Andy Kurth <[email protected] > > > > > > wrote: > > > > > > > > > > > > > Thanks Young. Looks good! If I understand correctly, you are > > > > avoiding > > > > > > the > > > > > > > need to use the CLI or cpan module by interacting directly with > > > > > OpenStack > > > > > > > via the REST API? > > > > > > > > > > > > > > It looks like the only commands you're running on the > management > > > node > > > > > are > > > > > > > "nova" and "qemu-img" in _get_flavor_type. Would it be > possible > > to > > > > > > > accomplish this via the API? I haven't traced through how your > > > code > > > > > > works > > > > > > > too deeply, but was wondering if the following could be used: > > > > > > > http://docs.openstack.org/api/openstack > > > > > > > -compute/2/content/Flavors-d1e4180.html > > > > > > > > > > > > > > It would be wonderful if you can eliminate the need for these > to > > be > > > > > > > executed. This would mean a pure API solution with nothing > > special > > > > > > needing > > > > > > > to be installed on the management node. > > > > > > > > > > > > > > If you do need to call these commands, instead of using qx and > > > > > backticks > > > > > > > are used to run commands on the management node. Please change > > > this > > > > to > > > > > > > use: > > > > > > > my ($exit_status, $output) = $self->mn_os->execute($command); > > > > > > > > > > > > > > Also, always, always, always make sure $output and anything > else > > > you > > > > > try > > > > > > to > > > > > > > parse with a regex are defined first. This will avoid some > nasty > > > > "Use > > > > > of > > > > > > > uninitialized value in pattern match" errors which could > > > potentially > > > > > lead > > > > > > > to the entire process dying. > > > > > > > > > > > > > > The indentation looks great! :) There are a few places where > > the > > > > > curly > > > > > > > bracket style could be modified. Just about all of the > existing > > > code > > > > > > > places opening brackets on the same line as the while/for > > statement > > > > > such > > > > > > > as: > > > > > > > while ($loop > 0) { > > > > > > > -instead of- > > > > > > > while ($loop > 0) > > > > > > > { > > > > > > > > > > > > > > Please add a pod "=head2 subroutine_name ... =cut" heading for > > > every > > > > > > > subroutine. This is helpful for others to read/understand your > > > code. > > > > > > The > > > > > > > pod syntax can be a bit finicky. You can tell if it is > formatted > > > > > > properly > > > > > > > by running "pod2text openstack.pm". > > > > > > > > > > > > > > Lastly (as mainly a reminder), we will need to incorporate all > of > > > the > > > > > > > database changes in vcl.sql and whatever method we use for the > > next > > > > > > release > > > > > > > to replace update-vcl.sql. I made a reminder comment here: > > > > > > > https://issues.apache.org/jira/browse/VCL-764 > > > > > > > > > > > > > > Regards, > > > > > > > Andy > > > > > > > > > > > > > > > > > > > > > > > > > > > >
