(sent too soon)

Summary:
before:
1043 local.oplog.rs
 868 juju.txns
 470 juju.leases

after:
 802 local.oplog.rs
 625 juju.txns
 267 juju.leases

So there seems to be a fairly noticeable decrease in load on the system
around leases (~70%). Again, not super scientific because I didn't measure
over enough time, deal with variation, all that kind of stuff. But at least
at a glimpse it looks pretty good.
As far as load around the global clock:
        juju.globalclock      5ms     3ms      1ms
        juju.globalclock     10ms     8ms      1ms
etc

So generally noticeable, but not specifically an issue.

Hopefully we'll see similar improvements in live systems. The main thing is
to make sure upgrade is smooth from 2.2 to 2.3 since the lease issue was a
pretty major crash of the system.

John
=:->

On Wed, Nov 1, 2017 at 6:43 PM, John A Meinel <john.mei...@canonical.com>
wrote:

> So I wanted to know if Andrew's changes in 2.3 are going to have a
> noticeable affect at scale on Leadership. So I went and set up a test with
> HA controllers running 10 machines each with 3 containers, and then
> distributing ~500 applications each with 3 units across everything.
> I started at commit 2e50e5cf4c3 which is just before Andrew's Lease patch
> landed.
>
> juju bootstrap aws/eu-west-2 --bootstrap-constraint
> instance-type=m4.xlarge --config vpc-id=XXXX
> juju enable-ha -n3
> # Wait for things to stablize
> juju deploy -B cs:~jameinel/ubuntu-lite -n10 --constraints
> instance-type=m4.xlarge
> # wait
>
> #set up the containers
> for i in `seq 0 9`; do
>   juju deploy -n3 -B cs:~jameinel/ubuntu-leader ul --to
> lxd:${i},lxd:${i},lxd:${i}
> done
>
> # scale up. I did this more in batches of a few at a time, but slowly grew
> all the way up
> for j in `seq 1 49`; do
>   echo $j
>   for i in `seq 0 9`; do
>     juju deploy -B -n3 cs:~jameinel/ubuntu-leader ul${i}{$j} --to
> ${i}/lxd/0,${i}/lxd/1,${i}/lxd/2 &
>   done
>   time wait
> done
>
> I let it go for a while until "juju status" was happy that everything was
> up and running. Note that this was 1500 units, 500 applications in a single
> model.
> time juju status was around 4-10s.
>
> I was running 'mongotop' and watching 'top' while it was running.
>
> I then upgraded to the latest juju dev (c49dd0d88a).
> Now, the controller immediately started thrashing, with bad lease
> documents in the database, and eventually got to the point that it ran out
> of open file descriptors. Theoretically upgrading 2.2 => 2.3 won't have the
> same problem because the actual upgrade step should run.
> However, if I just did "db.leases.remove({})" it recovered.
> I ended up having to restart mongo and jujud to recover from the open file
> handles, but it did eventually recover.
>
> At this point, I waited again for everything to look happy, and watch
> mongotop and top again.
>
> These aren't super careful results, where I would want to run things for
> like an hour each and check the load over that whole time. Really I should
> have set up prometheus monitoring. But as a quick check, these are the top
> values for mongotop before:
>
>                       ns    total     read    write
>           local.oplog.rs    181ms    181ms      0ms
>                juju.txns    120ms     10ms    110ms
>              juju.leases     80ms     34ms     46ms
>            juju.txns.log     24ms      4ms     19ms
>
>                       ns    total     read    write
>           local.oplog.rs    208ms    208ms      0ms
>                juju.txns    140ms     12ms    128ms
>              juju.leases     98ms     42ms     56ms
>              juju.charms     43ms     43ms      0ms
>
>                       ns    total     read    write
>           local.oplog.rs    220ms    220ms      0ms
>                juju.txns    161ms     14ms    146ms
>              juju.leases    115ms     52ms     63ms
> presence.presence.beings     69ms     68ms      0ms
>
>                       ns    total     read    write
>           local.oplog.rs    213ms    213ms      0ms
>                juju.txns    164ms     15ms    149ms
>              juju.leases     82ms     35ms     47ms
> presence.presence.beings     79ms     78ms      0ms
>
>                       ns    total     read    write
>           local.oplog.rs    221ms    221ms      0ms
>                juju.txns    168ms     13ms    154ms
>              juju.leases     95ms     40ms     55ms
>            juju.statuses     33ms     16ms     17ms
>
> totals:
> 1043 local.oplog.rs
> juju.txns 868
> juju.leases 470
>
> and after
>
>                       ns    total    read    write
>           local.oplog.rs     95ms    95ms      0ms
>                juju.txns     68ms     6ms     61ms
>              juju.leases     33ms    13ms     19ms
>            juju.txns.log     13ms     3ms     10ms
>
>                       ns    total     read    write
>           local.oplog.rs    200ms    200ms      0ms
>                juju.txns    160ms     10ms    150ms
>              juju.leases     78ms     35ms     42ms
>            juju.txns.log     29ms      4ms     24ms
>
>                       ns    total     read    write
>           local.oplog.rs    151ms    151ms      0ms
>                juju.txns    103ms      6ms     97ms
>              juju.leases     45ms     20ms     25ms
>            juju.txns.log     21ms      6ms     15ms
>
>                       ns    total     read    write
>           local.oplog.rs    138ms    138ms      0ms
>                juju.txns     98ms      6ms     91ms
>              juju.leases     30ms     13ms     16ms
>            juju.txns.log     18ms      3ms     14ms
>
>                       ns    total     read    write
>           local.oplog.rs    218ms    218ms      0ms
>                juju.txns    196ms     14ms    182ms
>              juju.leases     81ms     36ms     44ms
>            juju.txns.log     34ms      5ms     29ms
>
>
>
>
-- 
Juju-dev mailing list
Juju-dev@lists.ubuntu.com
Modify settings or unsubscribe at: 
https://lists.ubuntu.com/mailman/listinfo/juju-dev

Reply via email to