I think a variant of option 2 is ideal, similar to the vmware module. Warn and log if it does not get deleted and only fail the vm node if it still responds to ssh.
Also could extend the healthcheck module to do routine cleanup. just my thoughts, Aaron On Tue, Jul 22, 2014 at 3:37 PM, Cameron Mann <[email protected]> wrote: > Agreed, this goes back to more general VCL behaviour so it would be good to > get others' input. > > Cameron > > > On Tue, Jul 22, 2014 at 1:22 PM, YOUNG OH <[email protected]> wrote: > >> That's great observation. I think option 1 and 2 can provide fast loading >> time due to no load failure. But, to avoid quota issues, an admin should >> periodically check the all the instances whether there are any duplicate >> instance names and defunct instances. The option 3 is safe but slow to load >> if any deleting instance fails. I also agree that option 2 would be a good >> choice because it can provide lower load time for end-users and also we >> cannot exactly estimate the deletion time. But I hope to hear others' >> thoughts. >> >> Best regards >> Young >> >> >> On Tue, Jul 22, 2014 at 2:46 PM, Cameron Mann <[email protected]> >> wrote: >> >> > Looks good. Though I do wonder if it's necessary to fail the entire load >> > process just because the old instance doesn't get deleted. I think >> there's >> > three possibilities: >> > >> > 1. Don't check for successful deletion; there won't be any conflicts >> > because we're using openstackComputerMap. This would give the fastest >> load >> > times, but the only way to find out that something went wrong would be to >> > look at the list of instances and see if there are any duplicate names. >> > Could cause issues with quotas if running near capacity since there could >> > be extra instances lying around. >> > >> > 2. Check for successful deletion, but only log the error, don't fail the >> > load. Slower load times, but the load won't fail and the error will be >> > logged. Could cause issues with quotas if running near capacity since >> there >> > could be extra instances lying around. >> > >> > 3. What the module does now, check for successful deletion and fail if >> not. >> > Least end user friendly since they might encounter failures, but the >> safest >> > option. Won't cause quota issues on it's own, though an admin could still >> > change the computer back to available without deleting the defunct >> > instance. >> > >> > Instance deletion time is also not very consistent in my experience; I've >> > seen anything from seconds to over a minute and I imagine it could go >> > higher on OpenStack systems that see heavier usage. If we stick with >> option >> > 3 I'd recommend bumping the timeout by another minute or two just to be >> > safe. I think it's less necessary for option 2 since it doesn't fail on >> > timeout. >> > >> > I took a look at some of the other provisioning modules to see what they >> > do: >> > >> > - VMware logs a warning if it fails to delete the old VM, but only fails >> if >> > the VM is still responding to SSH >> > - Libvirt fails if deletion fails >> > - VirtualBox doesn't check for successful deletion, though it will fail >> if >> > it can't find the old VM to delete >> > >> > I think options 2 or 3 would be most consistent with existing behaviour. >> > I'd lean towards option 2 since end users won't see any extra failures >> and >> > we can keep a lower timeout which will mean lower load times even if a >> > deletion takes a long time. >> > >> > What are you thoughts? >> > >> > Cameron >> > >> > >> > On Tue, Jul 22, 2014 at 9:05 AM, YOUNG OH <[email protected]> >> wrote: >> > >> > > Cameron, >> > > >> > > Yes, you are definitely right. I was noticed that using hostname to >> find >> > > the openstack instance id is not working properly and also can cause >> the >> > > problem you described. I've back to use the openstackComputerMap table >> to >> > > get_os_instance_id when the instance is pingable and also add a loop in >> > > _terminate_os_instance to check whether the instance is completely >> > deleted >> > > or not. Please take a look at it again and let me know if you have any >> > > concerns. Thank you. >> > > >> > > Best regards, >> > > Young >> > > >> > > >> > > On Mon, Jul 21, 2014 at 4:18 PM, Cameron Mann <[email protected]> >> > > wrote: >> > > >> > > > Sounds like good progress to me. One comment though, it looks like >> > > > _terminate_os_instance does the DELETE request, checks the response >> for >> > > > success and then sleeps for 30 seconds while the instance deletes. >> > > However, >> > > > I don't believe a successful response to the DELETE request >> guarantees >> > > the >> > > > instance will actually be deleted. I've run into situations where an >> > > > instance gets stuck in the error or deleting states but the command >> > line >> > > > client reports no errors when trying to delete it. This could result >> > in a >> > > > situation where multiple instances with the same name exist which >> could >> > > > cause _get_os_instance_id to return the wrong ID since it filters the >> > > > instances based on name and selects the first in the list. >> > > > >> > > > I think either returning to using openstackComputerMap or looping >> with >> > a >> > > > timeout until the instance is actually deleted would be better >> choices. >> > > The >> > > > former would allow the new instance to be created even if the >> deletion >> > of >> > > > the old one fails. The latter would put the computer in VCL into an >> > error >> > > > state which would make it more obvious something has gone wrong, >> though >> > > at >> > > > the cost of potentially failing a user's reservation. As an added >> > > > precaution It might also be worth having _get_os_instance_id fail if >> > > > there's more than one instance in the response. >> > > > >> > > > Cameron >> > > > >> > > > >> > > > On Fri, Jul 18, 2014 at 9:25 AM, YOUNG OH <[email protected]> >> > > wrote: >> > > > >> > > > > Cameron, >> > > > > >> > > > > I hope you had a great time and welcome back to work :-). And, yes, >> > the >> > > > > OpenStack module with directly using OpenStack APIs can solve the >> > > > concerns >> > > > > we've discussed and it's more flexible to apply new version of >> > > OpenStack >> > > > > APIs, if necessary. In the updated openstack module, I've changed >> the >> > > two >> > > > > main things. First, I've used the hostname in Computer table >> (unique >> > in >> > > > the >> > > > > same VCL database) to create an instance and get the instance id to >> > > > > terminate rather than using the openstackComputerMap table. This >> can >> > > > avoid >> > > > > using an additional table and database transactions. Second, I've >> > > changed >> > > > > the openstackImageMap to openstackimagerevision table that maps the >> > > > > imagerevision id with the openstack image id. This table consists >> of >> > > > three >> > > > > fields (imagerevisionid, imagedetails, flavordetails). The >> > imagedetails >> > > > and >> > > > > flavordetails contains the details image and flavor information >> with >> > > json >> > > > > format. Thus, when VCL creates an instance, it gets each detail >> > > > information >> > > > > and parse them to find the corresponding openstack image id and >> > flavor >> > > > id. >> > > > > In addition, I've implemented the get_image_size() subroutine >> because >> > > the >> > > > > image size information was not supported in OpenStack ESSEX but it >> > > > supports >> > > > > now. This is a short summary about the changes. So, if you have any >> > > > concern >> > > > > or questions about the updates, please let me know. Thank you. >> > > > > >> > > > > Best regards, >> > > > > Young-Hyun >> > > > > >> > > > > >> > > > > On Thu, Jul 17, 2014 at 11:53 AM, Cameron Mann < >> > [email protected] >> > > > >> > > > > wrote: >> > > > > >> > > > > > Sorry for the silence from my end, I realized I forgot to >> mention I >> > > was >> > > > > > going to be on vacation for the last week and a half. Anyways, it >> > > looks >> > > > > > like Young's updates have addressed the main concerns we were >> > having >> > > > with >> > > > > > regards to the command line client. Given the progress he's made >> we >> > > > think >> > > > > > going ahead with his module makes the most sense. >> > > > > > >> > > > > > Cameron >> > > > > > >> > > > > > >> > > > > > On Wed, Jul 16, 2014 at 9:27 AM, YOUNG OH < >> [email protected] >> > > >> > > > > wrote: >> > > > > > >> > > > > > > Andy, >> > > > > > > >> > > > > > > Thank you for your comments. I've tried to apply what you >> > addressed >> > > > and >> > > > > > > committed my module again. This module finds all openstack >> > > > information >> > > > > > > using OpenStack APIs and database. Thank you. >> > > > > > > >> > > > > > > Best regards, >> > > > > > > Young-Hyun >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > On Wed, Jul 9, 2014 at 10:24 AM, Andy Kurth < >> [email protected] >> > > >> > > > > wrote: >> > > > > > > >> > > > > > > > Thanks Young. Looks good! If I understand correctly, you >> are >> > > > > avoiding >> > > > > > > the >> > > > > > > > need to use the CLI or cpan module by interacting directly >> with >> > > > > > OpenStack >> > > > > > > > via the REST API? >> > > > > > > > >> > > > > > > > It looks like the only commands you're running on the >> > management >> > > > node >> > > > > > are >> > > > > > > > "nova" and "qemu-img" in _get_flavor_type. Would it be >> > possible >> > > to >> > > > > > > > accomplish this via the API? I haven't traced through how >> your >> > > > code >> > > > > > > works >> > > > > > > > too deeply, but was wondering if the following could be used: >> > > > > > > > http://docs.openstack.org/api/openstack >> > > > > > > > -compute/2/content/Flavors-d1e4180.html >> > > > > > > > >> > > > > > > > It would be wonderful if you can eliminate the need for these >> > to >> > > be >> > > > > > > > executed. This would mean a pure API solution with nothing >> > > special >> > > > > > > needing >> > > > > > > > to be installed on the management node. >> > > > > > > > >> > > > > > > > If you do need to call these commands, instead of using qx >> and >> > > > > > backticks >> > > > > > > > are used to run commands on the management node. Please >> change >> > > > this >> > > > > to >> > > > > > > > use: >> > > > > > > > my ($exit_status, $output) = $self->mn_os->execute($command); >> > > > > > > > >> > > > > > > > Also, always, always, always make sure $output and anything >> > else >> > > > you >> > > > > > try >> > > > > > > to >> > > > > > > > parse with a regex are defined first. This will avoid some >> > nasty >> > > > > "Use >> > > > > > of >> > > > > > > > uninitialized value in pattern match" errors which could >> > > > potentially >> > > > > > lead >> > > > > > > > to the entire process dying. >> > > > > > > > >> > > > > > > > The indentation looks great! :) There are a few places >> where >> > > the >> > > > > > curly >> > > > > > > > bracket style could be modified. Just about all of the >> > existing >> > > > code >> > > > > > > > places opening brackets on the same line as the while/for >> > > statement >> > > > > > such >> > > > > > > > as: >> > > > > > > > while ($loop > 0) { >> > > > > > > > -instead of- >> > > > > > > > while ($loop > 0) >> > > > > > > > { >> > > > > > > > >> > > > > > > > Please add a pod "=head2 subroutine_name ... =cut" heading >> for >> > > > every >> > > > > > > > subroutine. This is helpful for others to read/understand >> your >> > > > code. >> > > > > > > The >> > > > > > > > pod syntax can be a bit finicky. You can tell if it is >> > formatted >> > > > > > > properly >> > > > > > > > by running "pod2text openstack.pm". >> > > > > > > > >> > > > > > > > Lastly (as mainly a reminder), we will need to incorporate >> all >> > of >> > > > the >> > > > > > > > database changes in vcl.sql and whatever method we use for >> the >> > > next >> > > > > > > release >> > > > > > > > to replace update-vcl.sql. I made a reminder comment here: >> > > > > > > > https://issues.apache.org/jira/browse/VCL-764 >> > > > > > > > >> > > > > > > > Regards, >> > > > > > > > Andy >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> -- Aaron Peeler Program Manager Virtual Computing Lab NC State University All electronic mail messages in connection with State business which are sent to or received by this account are subject to the NC Public Records Law and may be disclosed to third parties.
