I spent some time re-working the algorithm that does the rolling replacement. It is much smarter now, and it shouldn't cause unecessary scaling events. I've also merged the functionality of #589. Would you mind giving it a whirl?
https://github.com/ansible/ansible-modules-core/pull/1030 On Wednesday, March 25, 2015 at 1:42:19 PM UTC-4, [email protected] wrote: > > Thanks James. > > All the instances terminated are due to being marked *Unhealthy* by > terminate_batch(). > > I am using the changes from this PR: > https://github.com/ansible/ansible-modules-core/pull/589, combined with > the fixes in PR 601. Rationale: I need `lc_check=no` to cause all instances > to get replaced. With the current way it is written in the module lc_check > only works if the active if an instance has a different Launch Config than > the one assigned to the ASG. Upon further consideration, I should add a new > option instead of overloading the meaning of lc_check. > > > On Wednesday, March 25, 2015 at 8:02:12 AM UTC-7, James Martin wrote: >> >> Looking forward to the github issue -- make sure you take a look at the >> autoscale group and the ELB in the AWS console and see if it gives a >> description why the instances were terminated. I've seen cases where >> things did not come online fast enough and the ELB marks them as unhealthy >> and the ASG terminates them. >> >> Thanks, >> >> James >> >> On Wednesday, March 25, 2015 at 9:36:17 AM UTC-4, [email protected] >> wrote: >>> >>> For Ansible 1.9-develop Pull request 601 >>> <https://github.com/ansible/ansible-modules-core/pull/601> had the fix >>> for Issue 383, which does affect our production ASG about every two weeks >>> or so. We use the ec2_asg module to refresh our ASG instances 3 times a >>> day. >>> >>> I was eager to test. In doing so, I noticed that the >>> replace_all_instances or replace_instances options cause extra set of >>> scaling events. Has anyone else who uses either replace_ option see >>> this happen? See below for the screen shot which demonstrates the behavior. >>> >>> We have one instance in two different Availability Zones. So we use a >>> batch size of two (actually a formula based upon the length of the >>> availability_zones list of the ASG). >>> >>> Interesting... I just tested with batch_size: 1. The extra set of >>> scaling events was 1. I.e. one new instance launched and one new instance >>> terminated. >>> >>> The batch_size logic is broken. I am going open an Issue in >>> *ansible-modules-core*, but welcome others to note their experience >>> here. I'll update this topic with a link to the Issue, too. >>> >>> - name: Retrieve Auto Scaling Group properties >>> local_action: >>> module: ec2_asg >>> name: "{{ asg_name }}" >>> state: present >>> health_check_type: ELB >>> register: result_asg >>> >>> - name: Auto Scaling Group properties >>> debug: var=result_asg >>> >>> - name: Replace current instances with fresh instances >>> local_action: >>> module: ec2_asg >>> name: "{{ asg_name }}" >>> state: present >>> min_size: "{{ result_asg.min_size }}" >>> max_size: "{{ result_asg.max_size }}" >>> desired_capacity: "{{ result_asg.desired_capacity }}" >>> health_check_type: "{{ result_asg.health_check_type }}" >>> lc_check: no >>> replace_all_instances: yes >>> replace_batch_size: "{{ result_asg.availability_zones | length() >>> }}" >>> >>> >>> >>> >>> >>> 1. and 2. are expected. a. - d. are extra scaling events. >>> >>> >>> <https://lh3.googleusercontent.com/-1bvOCHDYhjU/VRK3Bkz1O-I/AAAAAAAAACs/RCDKklA7Hkc/s1600/EC2_Management_Console.jpg> >>> >> -- You received this message because you are subscribed to the Google Groups "Ansible Project" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/ansible-project/781bdd7a-c6c9-4958-8c9f-63ab0ffc3f5e%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.
