Re: Testing Leader Election reconfiguration

Cory Johns Tue, 15 Mar 2016 10:55:21 -0700

Tom,

It's also important to note that sentry.wait() waits for *all* units in the
deployment to settle for at least 30 seconds, so it might be possible that
another unit that wasn't included in the status gist you provided is
churning and causing it to time out.  That's particularly possible if
you're reusing the deployer instance and all 34+ of those machines (going
by the machine numbers in your gist) are still extant; with that many
machines, even the periodic update-status hooks could be overlapping enough
to prevent the 30 second idle window from registering.


I'd recommend using the wait_for_mesages [1] alternative which relies on
the charm to report its status explicitly and thus doesn't need to use
heuristics like the 30 second idle window.  It could also make your test
case code a bit cleaner.

And, of course, reusing units when possible and cleaning up between test
cases can help, as well.

[1]:
https://pythonhosted.org/amulet/amulet.html#amulet.sentry.Talisman.wait_for_messages

On Tue, Mar 15, 2016 at 1:02 PM, Tim Van Steenburgh <
tim.van.steenbu...@canonical.com> wrote:

>
>
> On Tue, Mar 15, 2016 at 12:30 PM, Tom Barber <t...@analytical-labs.com>
> wrote:
>
>> Hi Tim,
>>
>> Why would I need to increase the timeout when the status says all the
>> unit are operational?
>>
>
> The default wait time is 300s, with an "idle threshold" of 30s. Which
> means, it waits for everything to be idle for 30s before returning from the
> wait. This means that with the default timeout, if the env doesn't settle
> within 4m30s, it'll time out. This may not be what's happening in your
> case, but it's worth trying a longer timeout value to make sure.
>
>
>> The status dump came out of bundletester which said that it failed on the
>> first wait(), I assume the status dump arrived at the same time?
>> Bugs are allowed, the test was hacked up from a previous one, it doesn't
>> do anything yet, I'm trying to make sure the logic works first.
>>
>> Tom
>>
>> --------------
>>
>> Director Meteorite.bi - Saiku Analytics Founder
>> Tel: +44(0)5603641316
>>
>> (Thanks to the Saiku community we reached our Kickstart
>> <http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
>> goal, but you can always help by sponsoring the project
>> <http://www.meteorite.bi/products/saiku/sponsorship>)
>>
>> On 15 March 2016 at 16:27, Tim Van Steenburgh <
>> tim.van.steenbu...@canonical.com> wrote:
>>
>>> Hey Tom,
>>>
>>> 1. You can increase the wait time until it doesn't time out:
>>> self.d.sentry.wait(timeout=1200)
>>> 2. At what point in this sequence of commands was the status dump
>>> captured?
>>> 3. There is a bug here. You take a reference to the pdi/0 info dict on
>>> line 1. It's the same object you use to get message2 and message3 later.
>>> So, you'll get the same message that you got on line 1. You need `message3
>>> = self.d.sentry['pdi'][0].info['workload-status'].get('message')`
>>> instead.
>>>
>>> Hope this helps.
>>>
>>> On Tue, Mar 15, 2016 at 11:41 AM, Tom Barber <t...@analytical-labs.com>
>>> wrote:
>>>
>>>> Okay back here again, so my nice leader election function looks like:
>>>>
>>>>    def test_leader_election_failover(self):
>>>>         unit = self.d.sentry['pdi'][0].info
>>>>         message = unit['workload-status'].get('message')
>>>>         ip = message.split(':', 1)[-1]
>>>>         self.d.add_unit('pdi', 2)
>>>>         self.d.sentry.wait()
>>>>         message2 = unit['workload-status'].get('message')
>>>>         ip2 = message2.split(':', 1)[-1]
>>>>         self.assertEqual(ip, ip2)
>>>>         self.d.remove_unit('pdi/0')
>>>>         self.d.sentry.wait()
>>>>         message3 = unit['workload-status'].get('message')
>>>>         ip3 = message3.split(':', 1)[-1]
>>>>
>>>>         self.assertNotEqual(ip3, ip2)
>>>>
>>>> I know there's no logic in there, but I need to make sure the stuff
>>>> actually functions.
>>>>
>>>> So Tim says wait() should work, but when I tested this last night,
>>>>
>>>> I get a timeout error o the wait right after add_unit.
>>>>
>>>> https://gist.github.com/buggtb/c271dd79d782af57dea6
>>>>
>>>> Yet in the status dump you can see all 3 units sat there seemingly
>>>> happy.
>>>>
>>>> Tom
>>>>
>>>> --------------
>>>>
>>>> Director Meteorite.bi - Saiku Analytics Founder
>>>> Tel: +44(0)5603641316
>>>>
>>>> (Thanks to the Saiku community we reached our Kickstart
>>>> <http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
>>>> goal, but you can always help by sponsoring the project
>>>> <http://www.meteorite.bi/products/saiku/sponsorship>)
>>>>
>>>> On 9 March 2016 at 18:31, Tom Barber <t...@analytical-labs.com> wrote:
>>>>
>>>>> Oh really?
>>>>>
>>>>> /me stokes his invisible beard.
>>>>>
>>>>>
>>>>> Okay I'll go back and try again.
>>>>>
>>>>> Tom
>>>>>
>>>>> --------------
>>>>>
>>>>> Director Meteorite.bi - Saiku Analytics Founder
>>>>> Tel: +44(0)5603641316
>>>>>
>>>>> (Thanks to the Saiku community we reached our Kickstart
>>>>> <http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
>>>>> goal, but you can always help by sponsoring the project
>>>>> <http://www.meteorite.bi/products/saiku/sponsorship>)
>>>>>
>>>>> On 9 March 2016 at 16:56, Tim Van Steenburgh <
>>>>> tim.van.steenbu...@canonical.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 9, 2016 at 6:31 AM, Tom Barber <t...@analytical-labs.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Stuart.
>>>>>>>
>>>>>>> I do put a note in my charm message indicating the leader IP address
>>>>>>> so that users know which to connect to.
>>>>>>>
>>>>>>> So with juju wait, would I destroy a unit then execute juju wait? At
>>>>>>> which point it will hang until the leader election stuff is over and all
>>>>>>> becomes stable again?
>>>>>>>
>>>>>>>
>>>>>> Since you're already using amulet, there's no need to use the
>>>>>> juju-wait plugin
>>>>>> since d.sentry.wait() does the same thing. So yes, you would do
>>>>>> d.remove_unit(...)
>>>>>> and then call d.sentry.wait().
>>>>>>
>>>>>>
>>>>>>> Also, will this work if I push it upstream to the charmers and the
>>>>>>> automated tests up there?
>>>>>>>
>>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> Tom
>>>>>>>
>>>>>>> --------------
>>>>>>>
>>>>>>> Director Meteorite.bi - Saiku Analytics Founder
>>>>>>> Tel: +44(0)5603641316
>>>>>>>
>>>>>>> (Thanks to the Saiku community we reached our Kickstart
>>>>>>> <http://kickstarter.com/projects/2117053714/saiku-reporting-interactive-report-designer/>
>>>>>>> goal, but you can always help by sponsoring the project
>>>>>>> <http://www.meteorite.bi/products/saiku/sponsorship>)
>>>>>>>
>>>>>>> On 9 March 2016 at 11:00, Stuart Bishop <stuart.bis...@canonical.com
>>>>>>> > wrote:
>>>>>>>
>>>>>>>> On 9 March 2016 at 20:31, Tom Barber <t...@analytical-labs.com>
>>>>>>>> wrote:
>>>>>>>> > Morning all
>>>>>>>> >
>>>>>>>> > I'm trying to test for charm reconfiguration if the leader goes
>>>>>>>> AWOL.
>>>>>>>>
>>>>>>>> I put the role of the unit in its workload status, so it is easy for
>>>>>>>> operators to see which unit is master. And this also makes it easy
>>>>>>>> for
>>>>>>>> tests to tell.
>>>>>>>>
>>>>>>>>
>>>>>>>> > Adam suggested that I watch the status waiting for the next
>>>>>>>> leader election
>>>>>>>> > hook the wait on that and then check my service configs.
>>>>>>>>
>>>>>>>> You are best of waiting for all the hooks to complete and a steady
>>>>>>>> state, not just leader elected (since things will still be in flux
>>>>>>>> when that hook fires, such as the leader-settings-changed hooks it
>>>>>>>> will probably trigger and the relation changes those hooks will
>>>>>>>> likely
>>>>>>>> trigger). Use the juju-wait plugin, and maybe add support to
>>>>>>>> https://bugs.launchpad.net/juju-core/+bug/1488777 to get this into
>>>>>>>> core.
>>>>>>>>
>>>>>>>> --
>>>>>>>> Stuart Bishop <stuart.bis...@canonical.com>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Juju mailing list
>>>>>>> Juju@lists.ubuntu.com
>>>>>>> Modify settings or unsubscribe at:
>>>>>>> https://lists.ubuntu.com/mailman/listinfo/juju
>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
> --
> Juju mailing list
> Juju@lists.ubuntu.com
> Modify settings or unsubscribe at:
> https://lists.ubuntu.com/mailman/listinfo/juju
>
>

-- 
Juju mailing list
Juju@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju

Re: Testing Leader Election reconfiguration

Reply via email to