Hi Dan, I recently went through a similar exercise to the one you describe to move our prototype code on AWS.
First, my background includes a stint building a control plane for autoscaling VMs on OpenStack (and being generally long in tooth), but this is my first attempt at a Web App, and therefore Django too. I also grew up on VAXes, so the notion of an always-up cluster is deeply rooted. Technical comments follow inline... On Wed, 1 May 2019 at 21:35, <[email protected]> wrote: > My organization is moving into the AWS cloud, and with some other projects > using MongoDB, ElasticSearch and a web application framework that is not > Django, we've had no problem. > > I'm our "Systems/Applications Architect", and some years ago I helped > choose Django over some other solutions. I stand by that as correct for > us, but our Cloud guys want to know how to do Blue/Green deployments, and > the more I look at it the less happy I am. > > Here's the problem: > > - Django's ORM has long shielded developers from simple SQL problems > like "SELECT * FROM fubar ..." and "INSERT INTO fubar VALUES (...)" sorts > of problems. > - However, if an existing "Blue" deployment knows about a column, it > will try to retrieve it: > - fubar = Fubar.objects.get(name='Samwise') > - If a new "Green" deployment is coming up, and we want to wait until > Selenium testing has passed, we have the problem of migrations > > I really don't see any simple way around a new database cluster/instance > when we bring up a new cluster, with something like this: > > - Mark the live database as "in maintenance mode". The application > now will not write to the database, but we can also make that user's access > read/only to preserve this. > - Take a snapshot > - Restore the snapshot to build the new database instance/cluster. > - Make the new database as "live", e.g. clear "maintenance mode". If > t he webapp user is read-only, they are restored to full read/write > permissions. > - Run migrations in production > - Bring up new auto-scaling group > > We are not yet doing auto-scaling but otherwise your description applies very well to us. Right now, we have a pair of VMs, a "logic" VM hosting Django, and a "db" VM hosting Postgres (long term, we may move to Aurora for the database, but we are not there right now). The logic VM is based on an Ubuntu base image, but a load of extra stuff: - Django, our code and all Python dependencies - A whole host of non-Python dependencies starting with RabbitMQ (needed for Celery), nginx, etc - And a whole lot of configuration for the above (starting with production keys, passwords and the like) The net result is that not only does it take 10-15 minutes for AWS to spin up a new db VM from a snapshot, but it also takes several minutes to spin gup, install, and configure the logic VM. So, we have a piece of code that can do a "live-to-<scenario>" upgrade: - Where scenario is "live-to-live" or "live-to-test". - The logic is the same in both except for a couple of small pieces only in the live-to-live case where we: - Pause the live system (db and celery queues) before snapshotting it for the new spin up - Create an archive of the database - Switch the Elastic IP on successful sanity test pass - We also have a small piece of run-time code in our project/settings.py that, on a live system, enables HTTPS and so on. Before we do the "live-to-live" upgrade, we always to a "live-to-test" upgrade. This ensure we have run all migrations and pre-release sanities on virtually current data, and then perform a *separate* live-to-live. While this works, it creates a window during when the service must be down. There is also a finite window when all those 3rd party dependencies on apt and pip/pypi expose the "live-to-live" to potential failure. So in the "long term", I would prefer to attempt something like the following: - Use a cluster of logic N VMs. - Use an LB at the front end. - Enforce a development process that ensures that (roughly speaking) all database changes result in a new column, and where the old cannot be removed until a later update cycle. All migrations populate the new column. - We spin up and N+1th VM with the new logic, and once sanity testing is passed, switch the N+1 machine on in the LB, and remove one of the original N. - Loop - Delete the old column Of course, the $64k question is all around how to keep the old logic and the new logic in sync with the two columns. For that, I can only wave my arms at present and say that the old column cannot really be there in its bare form, instead there will be some kind of a view that makes it look like it is - possibly with some old school stored procedure/trigger logic in support. Of course, I would love it if there were some magic tooling developed by the Django and database gurus before I have to tackle this. Then again, I don't believe in magic. And nor do I believe we'll have an army of devs to fake the magic. I'd love to be shown a better way...(e.g. a complete second cluster, with a rolling migration of data from old to new until the old is killed?) else I'll be on the hook for making the above work! Thanks, Shaheed > Of course, some things that Django does really help: > > - The database migration is first tested by the test process, which > runs migrations > - The unit tests have succeeded before we try to do this migration. > > > Does anyone have experience/cloud operations design with doing Bluegreen > deployments with Django? > > -- > You received this message because you are subscribed to the Google Groups > "Django users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at https://groups.google.com/group/django-users. > To view this discussion on the web visit > https://groups.google.com/d/msgid/django-users/ae5310c6-b69f-43af-a838-5dce7bd6a712%40googlegroups.com > <https://groups.google.com/d/msgid/django-users/ae5310c6-b69f-43af-a838-5dce7bd6a712%40googlegroups.com?utm_medium=email&utm_source=footer> > . > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Django users" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/django-users. To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/CAHAc2jdcYNo%3DqrCW772h-rKJRCdUMsn%2B5tPJH%2BTOGFHGiTedqQ%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

