Just to round out this thread, this issue has now been dealt with (thanks Tim!). The server now translates these errors into a more useful error that indicates the client should retry the required. The Juju commmand line client transparently intercepts these errors and retries.
Here's the relevant pull request: https://github.com/juju/juju/pull/5927 On 26 July 2016 at 14:42, Reed O'Brien <[email protected]> wrote: > On Mon, Jul 25, 2016 at 5:38 PM, Menno Smits <[email protected]> > wrote: > >> Regarding https://bugs.launchpad.net/juju-core/+bug/1597601 ... >> >> When "juju enable-ha" is used, new controller machines are started, each >> running a mongod instance which is connected to Juju's replicaset. As each >> new node joins the replicaset a MongoDB leader election is triggered which >> causes all mongod instances in the replicaset to drop their connections >> (this is by design). The workers in the Juju's machine agents handle this >> correctly by aborting and restarting with fresh connections to MongoDB. >> >> The problem is that if an API request comes in at just the right time, it >> will be actioned just as the MongoDB connection goes down, resulting in the >> i/o timeout error being reported back to the client. >> >> This isn't a new problem but it's one that Juju's users regularly run in >> to. A workaround is to wait for the new controller machines to come up >> after enable-ha is issued before doing anything else. >> >> IMHO it would be best if Juju could hide all this from the client as much >> as possible but I'm really not sure if that's feasible or what the best >> approach should be. >> >> The challenge is that unless we do some major rearchitecting, the API >> server needs to be restarted when the MongoDB connections drop. There's no >> way to that the client's connection can stay up, making it difficult to >> hide this detail from the client. >> > > It seems that mgo could handle this as a failover. Or that we could see > that the replica set is starting and wait until it reports being up, then > refresh the mgo session. I don't understand why the API server itself has > to restart, though I am sure there are good reasons. > > >> >> The most practical solution I can think of is that we introduce a new >> error type over the API which means "please retry the request". Errors such >> as an i/o timeout from the MongoDB layer could be converted into this >> error. Clients would obviously have to handle this error specially. >> > > Barring handling it via mgo session this seems obvious and practical. > > > ~ro > > -- > Reed O'Brien > ✉ [email protected] > ✆ 415-562-6797 > >
-- Juju-dev mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
