Re: Rethinking migrations

Shai Berger Sat, 05 Nov 2016 10:41:05 -0700

Hi,

On Saturday 05 November 2016 17:53:49 Patryk Zawadzki wrote:
> 
> I'm typing this from the comfort of Django: Under the Hood sprints so
> please excuse poor grammar and the somewhat chaotic explanations that
> follow. I'm very tired and English is not my mother tongue. This is not a
> DEP but merely a stream of consciousness I'd love to get some feedback on.
>


I am dealing with some similar issues, but I've reached very different 
conclusions. In much the same spirit, this is not very orderly.


> Here are some of the problems we face when dealing with migrations:
> 
> 1. Dependency resolution that turns the migration dependency graph into an
> ordered list happens every time you try to create or execute a migration.
> If you have several hundred migrations it becomes quite slow. I'm talking
> multiple minutes kind of slow. As you can imagine working with multiple
> branches or perfecting your migrations quickly becomes a tedious task.
> 

I've known this to happen, indeed.

> 2. Dependency resolution is only stable as long as the migration set is
> frozen. Sometimes introducing a new migration is enough to break existing
> migrations by causing them to execute in a slightly different order. We
> often have to backtrack and edit existing migrations and enforce a strict
> resolution order by introducing arbitrary dependencies.
> 

So, you say you really have implicit dependencies between migrations -- 
dependencies in substance, which aren't recorded as dependencies. This seems 
to indicate that you have a lot of manually-written migrations (data 
migrations?), since the automatically-written ones do include relevant 
dependencies. This seems odd -- it sounds like you're doing something out of 
the ordinary.

This would also explain some of your bad experience with squashing -- indeed, 
if you have many data migrations, squashing can become much less effective.

> 3. Removing an app from a project is a nightmare. You can't migrate to zero
> state unless the app is still there. There is no way to add "revert all
> migrations for app X" to the migration graph, it's something you need to
> run manually. There is no clean way to remove an app that was ever
> references in a relation. We were forced to do all kinds of hacks to get
> around this. Sometimes it's necessary to create an empty eggshell app with
> the same name and copy all migrations there then add necessary data
> migrations and finally migrations that remove all the models, indices,
> procedures etc. Sometimes people just leave a dead application in
> INSTALLED_APPS to not have to deal with this.

Clear out (maybe even remove) models.py and type "makemigrations", and you get 
a migration that deletes everything. The answer to getting rid of the 
historical migrations is squashing, but of course you first need squashing to 
work properly.

> 
> 4. Squashing migrations is wonky at best. If you create a model in one
> migration, alter one of its fields in another and then finally drop the
> model sometime later, the squashed migration will have Django try to
> execute the alter first and complain about the table not being there. Also
> the only reason we need to squash migrations is to prevent problem 1 above
> from becoming exponentially worse. If migrations were only as slow as the
> underlying SQL commands, we'd likely never squash them.
> 

If that's so, it's a bug you should report; it's also an issue you can work-
around by editing the migration to remove the redundant operation. There are  
issues with squashing, to be sure, but I don't think this is one of the 
serious ones.

> 5. There's no simple way to roll back all the migrations introduced after a
> particular point in time which is very useful when working with multiple
> feature branches. In my current project dropping the database means having
> to reimport over 200 MB of data snapshots. Switching branches requires me
> to look at branch diffs to determine which migrations to revert.
> 

Yes, this is a real issue, with one modification -- I'd much rather have a good 
way to migrate to a point-in-version-history than to a point-in-time.

This is even more than a development issue -- I've encountered a use-case for 
doing something like this in production: If I want to be able to export an 
object represented by a model (or set of models), by serializing it and saving 
the serialized version; and then I'd want to import it back in after the app 
has progressed -- if I'd want generic support for that, I'd need a way to 
migrate a database to the point where the object was exported, import it, and 
then roll the database forward to the "present".

> 6. Conflict detection and resolution (migrate --merge) is a make-believe
> solution. It just trains people to execute the command without
> investigating whether their migration history still makes sense.
> 

It could be smarter, assuming it understood the content of migrations. We 
could probably improve it to a point where, for most cases, it would either 
know to merge automatically or know that there really is a conflict. This would 
probably not help you if you have a lot of RunPython's in your migrations.

> 
> Some of these I need to dig deeper into and probably file proper tickets.
> For example I have an idea on how to fix 4 but it would make 1 even slower.
> 
> I took some time to get a good long look at what other ORMs are doing. The
> graph-based dependency solving approach is rather uncommon. Most systems
> treat migrations as part of the project rather than the packages it uses.
> 
> 
> Possible solution (or "how I'd build it today if there was no existing code
> in Django core"):
> 
> a. Make migrations part of the project and not individual apps. This takes
> care of problem 3 above.
> 

So, there'd be no reason to link a migration to a specific app; quite the 
contrary, it would become much more logical to have one migration include 
operations for many apps. That could make the process of making an app 
reusable while developing it in a project quite painful.

> b. Prefix individual migration files with a UTC timestamp
> (20161105151023_add_foo) to provide a strict sorting order. This removes
> the depsolving requirement and takes care of 1 and 2. By eliminating those
> it makes 4 kind of obsolete as squashing migrations would become pointless.
> 
4: No, on large databases, squashing migrations is not pointless.

1&2: Strict order has its issues: Currently, if I find a problem with the last 
migration of app A, I roll it back, fix it, and roll forward. With strict 
order, I would have to roll back the project, not the app.

> c. Have reusable apps provide migration templates that Django then copies
> to my project when "makemigrations" is run.
> 

I'd like to see some more details about how this works; they would need to 
include the development process of reusable apps.

> d. Maintain a separate directory for each database connection.
> 
Seems wrong as a blanket idea -- really depends on how the databases are used. 
I wouldn't want to find myself maintaining copies of migrations which are 
supposed to run on more than one database.

> e. Execute all migrations in alphabetical order (which means by timestamp
> first). When an unapplied migration is followed by an applied one, ask
> whether to attempt to just apply it or if the user wants to first unapply
> migrations that came after it. To me this would work better than 6.
> 
This sounds like a good way to create data losses.

> f. Migrating to a timestamp solves 5.
> 

Not really. Not with a team, since the timestamps will indicate not the real 
logical order, but the order of development. You'd need empty "tag" migrations 
to set points you want to migrate to...

My 2 cents,
        Shai.

Re: Rethinking migrations

Reply via email to