Regarding https://bugs.launchpad.net/juju-core/+bug/1597601 ...
When "juju enable-ha" is used, new controller machines are started, each running a mongod instance which is connected to Juju's replicaset. As each new node joins the replicaset a MongoDB leader election is triggered which causes all mongod instances in the replicaset to drop their connections (this is by design). The workers in the Juju's machine agents handle this correctly by aborting and restarting with fresh connections to MongoDB. The problem is that if an API request comes in at just the right time, it will be actioned just as the MongoDB connection goes down, resulting in the i/o timeout error being reported back to the client. This isn't a new problem but it's one that Juju's users regularly run in to. A workaround is to wait for the new controller machines to come up after enable-ha is issued before doing anything else. IMHO it would be best if Juju could hide all this from the client as much as possible but I'm really not sure if that's feasible or what the best approach should be. The challenge is that unless we do some major rearchitecting, the API server needs to be restarted when the MongoDB connections drop. There's no way to that the client's connection can stay up, making it difficult to hide this detail from the client. The most practical solution I can think of is that we introduce a new error type over the API which means "please retry the request". Errors such as an i/o timeout from the MongoDB layer could be converted into this error. Clients would obviously have to handle this error specially. Does anyone have another idea? - Menno
-- Juju-dev mailing list [email protected] Modify settings or unsubscribe at: https://lists.ubuntu.com/mailman/listinfo/juju-dev
