Hi James, Thanks for your reply.
Interesting point about the HealthCheckGracePeriod option. I wasn't aware of its role here. I am indeed using it, in fact according to the docs it is a required option for ELB health checks. I had it set to 180, and I just tried it with lower values of 10 and 1 second. In both cases the behavior is the same: the autoscale group considers the instances healthy (because of the grace period, even at the lower value) and as a result ansible moves on before the instances are InService in the ELB. Even with the HealthCheckGracePeriod at the lowest possible value of 1 second, a race exists between the module's health check and the ELB grace period. I've worked around this for now with a script that does the following: - Find the instances in the ASG - Check the ELB to determine if they are healthy or not - Exit 1 if not, 0 if yes Then I use an ansible task with an "until" loop to check the return code. The script is here: https://gist.github.com/anonymous/05e99828848ee565ed33 Happy to work this in to an ansible module if you think this is useful. Or did I misunderstand the point about the health check grace period? Thanks, Ben On Monday, November 24, 2014 7:25:58 AM UTC-8, James Martin wrote: > > Ben, > > Thanks for the question. Considering this: > http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/as-add-elb-healthcheck.html, > > "Auto Scaling marks an instance unhealthy if the calls to the Amazon EC2 > action DescribeInstanceStatus return any state other than running, the > system status shows impaired, or the calls to Elastic Load Balancing action > DescribeInstanceHealth returns OutOfService in the instance state field." > > For determining the instance health status, we are fetching an ASG object > in boto and checking the health_status attribute for each instance in the > ASG, which are equal to either "healthy" or "unhealthy". Are you using an > instance grace period option for the ELB? > http://docs.aws.amazon.com/AutoScaling/latest/APIReference/API_CreateAutoScalingGroup.html, > > see HealthCheckGracePeriod. This option is configurable with the > health_check_period setting found in the ec2_asg module. By default it is > 500, and this would prematurely return the status of a healthy instance, as > it means it would mark any instance as healthy for 500 seconds. > > - James > > > > > > On Saturday, November 22, 2014 5:39:28 PM UTC-5, Ben Whaley wrote: >> >> Hi all, >> >> Sorry for resurrecting an old thread, but wanted to mention my experience >> thus far using ec2_asg & ec2_lc for code deploys. >> >> I'm more or less following the methods described in this helpful repo >> >> https://github.com/ansible/immutablish-deploys >> >> I believe the dual_asg role is accepted as the more reliable method for >> deployments. If a deployment uses two ASGs, it's possible to just delete >> the new ASG and everything goes back to normal. This is the "Netflix" >> manner of releasing updates. >> >> The thing I'm finding though is that instances become "viable" well >> before they're actually InService in the ELB. From the ec2_asg code and by >> running ansible in verbose mode it's clear that ansible considers an >> instance viable once AWS indicates that instances are Healthy and >> InService. Checking via the AWS CLI tool, I can see that the ASG shows >> instances as Healthy and InService, but the ELB shows OutOfService. >> >> The AWS docs are clear about the behavior of autoscale instances with >> health check type ELB: "For each call, if the Elastic Load Balancing action >> returns any state other than InService, the instance is marked as >> unhealthy." But this is not actually the case. >> >> Has anyone else encountered this? Any suggested workarounds or fixes? >> >> Thanks, >> Ben >> >> >> On Thursday, September 11, 2014 12:54:25 PM UTC-7, Scott Anderson wrote: >>> >>> On Sep 11, 2014, at 3:26 PM, James Martin <[email protected]> wrote: >>> >>> I think we’re probably going to move to a system that uses a tier of >>>> proxies and two ELBs. That way we can update the idle ELB, change out the >>>> AMIs, and bring the updated ELB up behind an alternate domain for the >>>> blue-green testing. Then when everything checks out, switch the proxies to >>>> the updated ELB and take down the remaining, now idle ELB. >>>> >>>> >>> Not following this exactly -- what's your tier of proxies? You have a >>> group of proxies (haproxy, nginx) behind a load balancer that point to your >>> application? >>> >>> >>> Yes, nginx or some other HA-ish thing. If it’s nginx then you can >>> maintain a brochure site even if something horrible happens to the >>> application. >>> >>> >>> >>>> Amazon would suggest using Route53 to point to the new ELB, but there’s >>>> too great a chance of faulty DNS caching breaking a switch to a new ELB. >>>> Plus there’s a 60s TTL to start with regardless, even in the absence of >>>> caching. >>>> >>> >>> Quite right. There are some interesting things you can do with tools >>> you could run on the hosts that would redirect traffic from blue hosts to >>> the green LB, socat being one. After you notice no more traffic coming to >>> blue, you can terminate it. >>> >>> >>> That’s an interesting idea, but it fails if people are behind a caching >>> DNS and they visit after you’ve terminated the blue traffic but before >>> their caching DNS lets go of the record. >>> >>> You're right, I did miss that. By checking the AMI, you're only >>> updating the instance if the AMI changes. If you a checking the launch >>> config, you are updating the instances if any component of the launch >>> config has changed -- AMI, instance type, address type, etc. >>> >>> >>> That’s true, but if I’m changing instance types I’ll generally just >>> cycle_all. Because of the connection draining and parallelism of the >>> instance creation, it’s just as quick to do all of them instead of the ones >>> that needs changing. That said, it’s an obvious optimization for sure. >>> >>> >>> Using the ASG to do the provisioning might be preferable if it’s >>>> reliable. At first I went that route, but I was having problems with the >>>> ASG’s provisioning being non-deterministic. Manually creating the >>>> instances >>>> seems to ensure that things happen in a particular order and with >>>> predictable speed. As mentioned, the manual method definitely works every >>>> time, although I need to add some more timeout and error checking (like >>>> what happens if I ask for 3 new instances and only get 2). >>>> >>>> >>> I didn't have any issues with the ASG doing the provisioning, but I >>> would say nothing is predictable with AWS :). >>> >>> >>> Very true. Over the past few months I’ve had several working processes >>> just fail with no warning. The most recent is AWS sometimes refusing to >>> return the current list of AMIs. Prior to that it was the Available status >>> on an AMI not really meaning available. Now I check the list of returned >>> AMIs in a loop until the one I’m looking for shows up, Available status >>> notwithstanding. Very frustrating. Things could be worse, however: the API >>> could be run by Facebook... >>> >>> >>>> I have a separate task that cleans up the old AMIs and LCs, >>>> incidentally. I keep the most recent around as a backup for quick >>>> rollbacks. >>>> >>> >>> That's cool, care to share? >>> >>> >>> >>> I think I’ve posted it before, but here’s the important bit. After >>> deleting everything but the oldest backup AMI (determined by naming >>> convention or tags), delete any LC that doesn’t have an associated AMI: >>> >>> def delete_launch_configs(asg_connection, ec2_connection, module): >>> changed = False >>> >>> launch_configs = asg_connection.get_all_launch_configurations() >>> >>> for config in launch_configs: >>> image_id = config.image_id >>> images = ec2_connection.get_all_images(image_ids=[image_id]) >>> >>> if not images: >>> config.delete() >>> changed = True >>> >>> module.exit_json(changed=changed) >>> >>> >>> -scott >>> >>> >>> -- You received this message because you are subscribed to the Google Groups "Ansible Project" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/bb5ec0f5-3c6a-4b0f-8950-ac05a3450641%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
