Oops, please use this link for the code instead. https://gist.github.com/bwhaley/eee6a0f61636862515aa
On Monday, November 24, 2014 9:44:21 AM UTC-8, Ben Whaley wrote: > > Hi James, > > Thanks for your reply. > > Interesting point about the HealthCheckGracePeriod option. I wasn't aware > of its role here. I am indeed using it, in fact according to the docs it is > a required option for ELB health checks. I had it set to 180, and I just > tried it with lower values of 10 and 1 second. In both cases the behavior > is the same: the autoscale group considers the instances healthy (because > of the grace period, even at the lower value) and as a result ansible moves > on before the instances are InService in the ELB. Even with the > HealthCheckGracePeriod at the lowest possible value of 1 second, a race > exists between the module's health check and the ELB grace period. > > I've worked around this for now with a script that does the following: > - Find the instances in the ASG > - Check the ELB to determine if they are healthy or not > - Exit 1 if not, 0 if yes > > Then I use an ansible task with an "until" loop to check the return code. > The script is here: > > https://gist.github.com/anonymous/05e99828848ee565ed33 > > Happy to work this in to an ansible module if you think this is useful. Or > did I misunderstand the point about the health check grace period? > > Thanks, > Ben > > > On Monday, November 24, 2014 7:25:58 AM UTC-8, James Martin wrote: >> >> Ben, >> >> Thanks for the question. Considering this: >> http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/as-add-elb-healthcheck.html, >> >> "Auto Scaling marks an instance unhealthy if the calls to the Amazon EC2 >> action DescribeInstanceStatus return any state other than running, the >> system status shows impaired, or the calls to Elastic Load Balancing action >> DescribeInstanceHealth returns OutOfService in the instance state field." >> >> For determining the instance health status, we are fetching an ASG object >> in boto and checking the health_status attribute for each instance in the >> ASG, which are equal to either "healthy" or "unhealthy". Are you using an >> instance grace period option for the ELB? >> http://docs.aws.amazon.com/AutoScaling/latest/APIReference/API_CreateAutoScalingGroup.html, >> >> see HealthCheckGracePeriod. This option is configurable with the >> health_check_period setting found in the ec2_asg module. By default it is >> 500, and this would prematurely return the status of a healthy instance, as >> it means it would mark any instance as healthy for 500 seconds. >> >> - James >> >> >> >> >> >> On Saturday, November 22, 2014 5:39:28 PM UTC-5, Ben Whaley wrote: >>> >>> Hi all, >>> >>> Sorry for resurrecting an old thread, but wanted to mention my >>> experience thus far using ec2_asg & ec2_lc for code deploys. >>> >>> I'm more or less following the methods described in this helpful repo >>> >>> https://github.com/ansible/immutablish-deploys >>> >>> I believe the dual_asg role is accepted as the more reliable method for >>> deployments. If a deployment uses two ASGs, it's possible to just delete >>> the new ASG and everything goes back to normal. This is the "Netflix" >>> manner of releasing updates. >>> >>> The thing I'm finding though is that instances become "viable" well >>> before they're actually InService in the ELB. From the ec2_asg code and by >>> running ansible in verbose mode it's clear that ansible considers an >>> instance viable once AWS indicates that instances are Healthy and >>> InService. Checking via the AWS CLI tool, I can see that the ASG shows >>> instances as Healthy and InService, but the ELB shows OutOfService. >>> >>> The AWS docs are clear about the behavior of autoscale instances with >>> health check type ELB: "For each call, if the Elastic Load Balancing action >>> returns any state other than InService, the instance is marked as >>> unhealthy." But this is not actually the case. >>> >>> Has anyone else encountered this? Any suggested workarounds or fixes? >>> >>> Thanks, >>> Ben >>> >>> >>> On Thursday, September 11, 2014 12:54:25 PM UTC-7, Scott Anderson wrote: >>>> >>>> On Sep 11, 2014, at 3:26 PM, James Martin <[email protected]> wrote: >>>> >>>> I think we’re probably going to move to a system that uses a tier of >>>>> proxies and two ELBs. That way we can update the idle ELB, change out the >>>>> AMIs, and bring the updated ELB up behind an alternate domain for the >>>>> blue-green testing. Then when everything checks out, switch the proxies >>>>> to >>>>> the updated ELB and take down the remaining, now idle ELB. >>>>> >>>>> >>>> Not following this exactly -- what's your tier of proxies? You have a >>>> group of proxies (haproxy, nginx) behind a load balancer that point to >>>> your >>>> application? >>>> >>>> >>>> Yes, nginx or some other HA-ish thing. If it’s nginx then you can >>>> maintain a brochure site even if something horrible happens to the >>>> application. >>>> >>>> >>>> >>>>> Amazon would suggest using Route53 to point to the new ELB, but >>>>> there’s too great a chance of faulty DNS caching breaking a switch to a >>>>> new >>>>> ELB. Plus there’s a 60s TTL to start with regardless, even in the absence >>>>> of caching. >>>>> >>>> >>>> Quite right. There are some interesting things you can do with tools >>>> you could run on the hosts that would redirect traffic from blue hosts to >>>> the green LB, socat being one. After you notice no more traffic coming to >>>> blue, you can terminate it. >>>> >>>> >>>> That’s an interesting idea, but it fails if people are behind a caching >>>> DNS and they visit after you’ve terminated the blue traffic but before >>>> their caching DNS lets go of the record. >>>> >>>> You're right, I did miss that. By checking the AMI, you're only >>>> updating the instance if the AMI changes. If you a checking the launch >>>> config, you are updating the instances if any component of the launch >>>> config has changed -- AMI, instance type, address type, etc. >>>> >>>> >>>> That’s true, but if I’m changing instance types I’ll generally just >>>> cycle_all. Because of the connection draining and parallelism of the >>>> instance creation, it’s just as quick to do all of them instead of the >>>> ones >>>> that needs changing. That said, it’s an obvious optimization for sure. >>>> >>>> >>>> Using the ASG to do the provisioning might be preferable if it’s >>>>> reliable. At first I went that route, but I was having problems with the >>>>> ASG’s provisioning being non-deterministic. Manually creating the >>>>> instances >>>>> seems to ensure that things happen in a particular order and with >>>>> predictable speed. As mentioned, the manual method definitely works every >>>>> time, although I need to add some more timeout and error checking (like >>>>> what happens if I ask for 3 new instances and only get 2). >>>>> >>>>> >>>> I didn't have any issues with the ASG doing the provisioning, but I >>>> would say nothing is predictable with AWS :). >>>> >>>> >>>> Very true. Over the past few months I’ve had several working processes >>>> just fail with no warning. The most recent is AWS sometimes refusing to >>>> return the current list of AMIs. Prior to that it was the Available status >>>> on an AMI not really meaning available. Now I check the list of returned >>>> AMIs in a loop until the one I’m looking for shows up, Available status >>>> notwithstanding. Very frustrating. Things could be worse, however: the API >>>> could be run by Facebook... >>>> >>>> >>>>> I have a separate task that cleans up the old AMIs and LCs, >>>>> incidentally. I keep the most recent around as a backup for quick >>>>> rollbacks. >>>>> >>>> >>>> That's cool, care to share? >>>> >>>> >>>> >>>> I think I’ve posted it before, but here’s the important bit. After >>>> deleting everything but the oldest backup AMI (determined by naming >>>> convention or tags), delete any LC that doesn’t have an associated AMI: >>>> >>>> def delete_launch_configs(asg_connection, ec2_connection, module): >>>> changed = False >>>> >>>> launch_configs = asg_connection.get_all_launch_configurations() >>>> >>>> for config in launch_configs: >>>> image_id = config.image_id >>>> images = ec2_connection.get_all_images(image_ids=[image_id]) >>>> >>>> if not images: >>>> config.delete() >>>> changed = True >>>> >>>> module.exit_json(changed=changed) >>>> >>>> >>>> -scott >>>> >>>> >>>> -- You received this message because you are subscribed to the Google Groups "Ansible Project" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/cf2571f0-00c1-4c69-8bc6-0edad6d57e71%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
