Hi Troy, On Tue, May 17, 2016 at 5:56 PM, Troy Harvey <[email protected]> wrote:
> Google Container Engine and Kubernetes are both amazing. Our team loves > using this tech. However, today was the second time that upgrades to > Container Engine brought our production site down. > I'm not sure that I understand. Are you referring to the master being upgraded or your nodes being upgraded? > The root cause had to do with our Docker version, but that detail doesn't > really matter. > What is your docker version? Is it different than the docker version that is installed by default? > In order to continue run our production site on Google Container Engine we > need to know when upgrades are happening, and we need to be able to test > our applications in an upgraded sandbox environment before those changes > are rolled out to the production Container Engine. > It sounds like you would like to know when your application may experience a disruption. Assuming that upgrades were non-disruptive, would you still care about when they happened? There is also a difference between when your control plane (managed by Google) gets upgraded and when your nodes are upgraded. These are different events and should have very different affects on your application: 1. When the control plane is upgraded, your application should not be disrupted. There may be an exception where there is a new minor version of Kubernetes released (e.g. upgrading from 1.1 to 1.2), and we tend to be very conservative before rolling out minor upgrades to users' control planes. Between when the new version is available and when existing clusters are upgraded, you can proactively upgrade your control plane which does give you a generous window to exercise the upgrade on a test cluster prior to it being applied automatically to your production cluster. And if you experience problems during the upgrade on your test cluster, we would really like to know about them, as it is our goal for these upgrades to be entirely non-disruptive. 2. When your nodes are upgraded, it can cause a disruption to your application. This is the primary reason that node upgrades are not done automatically today. It means that you are free to apply the upgrade at a time that is convenient for you, and you can apply it to test clusters prior to production clusters. In addition, using our recently released NodePools <https://cloud.google.com/container-engine/reference/rest/v1/projects.zones.clusters.nodePools> feature, you can add a new set of worker nodes to your cluster at the new version, shift your application over, and then upgrade (or remove) the previous set of worker nodes. This strategy can make your node upgrades significantly less disruptive (depending on your application) than just upgrading your existing nodes in place. > Am I missing the place where Google notifies it's Container Engine users > of the upgrades like the one done today? > No, you aren't missing anything. We do not advertise upgrades prior to their execution. > I see where the Group post appeared several hours after the upgrades were > completed <https://groups.google.com/forum/#!forum/gke-release-notes>, > but that's obviously too little too late. > We have experimented with announcing upgrades at different times during the process. If we announce too early then customers are confused why their cluster is not at the version announced. If we announce after the fact then customers, like yourself, ask for early notification. > I also see where the Release Notes are published so I thought maybe there > was a consistent release schedule, but those dates are all over the place. > We attempt to stick to ~weekly releases, but as you've noticed sometimes the dates shift a bit. And as I mentioned we've experimented with publishing the release notes at different points during the process which also makes the dates look much more erratic. > Our team is starting to talk about self-hosting now because of this, which > would be annoying for us to have to spend the energy supporting a container > engine instead of focusing on features for our customers. > We certainly aim to provide a hosted solution that meets your needs. I would like to know more about what caused your application downtime and what you think we can do to avoid it in the future. > > Thank you in advance for your suggestions and answers! > > Troy > > -- > You received this message because you are subscribed to the Google Groups > "Containers at Google" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/google-containers. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Containers at Google" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/google-containers. For more options, visit https://groups.google.com/d/optout.
