Hi Barry,

TL;DR: I think this is a bug and can lead to inconsistencies in other project 
setups than yours.

Let's look at the last question first, regarding duplicate entries in the 
django_migrations table: Yes, this is to be a bug. At least how it's currently 
used.
Let's say you have migration foo.0001_initial and apply it. You then have (foo, 
0001_initial) in the django_migrations table. You now create migration 
foo.0002_something and also add a squashed migration 
foo.0001_initial_squashed_0002_something. When you now run migrate, Django will 
apply foo.0002_something and your database will have (foo, 0001_initial), (foo, 
0002_something) as well as (foo, 0001_initial_squashed_0002_something).
So far so good. That's all as expected. If you now remove foo.0001_initial and 
foo.0002_something from your filesystem and remove the replaces section in 
foo.0001_initial_squashed_0002_something it is as if Django never new about 
foo.0001_initial or foo.0002_something. You can add new migrations, everything 
works the way it should. However, if you were to add e.g. foo.0002_something 
again, Django would treat it as already applied, despite it being somewhere 
later in your migration graph.
At this point, I don't think this is the intended behavior. That said, I'm 
inclined to say that applying a squashed migration should "unrecord" all 
migrations it replaces. I've not yet thought too much about the "fallout" 
(backwards compatibility, rollback of migrations, ...). But at least with 
regards to migrating forwards, this seems to be the right behavior.


Regarding your second point around "replaces" and merging migrations: I think 
this will lead to inconsistencies in your migration order, thus potentially 
causing trouble down the line. I'm yet to think of an example. For now I don't 
see us to change the behavior, but I would definitely *not* rely on it.
I suspect that two data migrations could easily conflict or result in 
inconsistent data if applied in the wrong order. For example, one data 
migration adding new records to a table, and another one ensuring that all 
values in a column are in upper case. If you apply both migrations in that 
order (insert and then ensure uppercase) you can be certain that all values 
will be uppercase. If you, however, first ensure uppercase and then insert 
additional values, you need to make sure that the data in the second migration 
is properly formatted.

Cheers,

Markus


On Tue, Aug 6, 2019, at 7:55 AM, Johnson, Barry wrote:
> 
> [ TL;DR: A migration may use a “replaces” list pointing to migrations 
> that don’t actually exist. This undocumented technique cleanly solves a 
> recurring difficult migration problem. We seek consensus on whether 
> this should become a documented feature or that it is an unexpected 
> side effect subject to change in the future. ]
> 
> 
> 
> We have found an undocumented behavior in the migration system that 
> gracefully solves the troublesome problem of merging migrations created 
> in parallel development branches. If this behavior should survive, 
> we’ll enter a documentation ticket – but if it’s considered a bug, 
> we’ll need to stay away from it and fall back to the more difficult 
> manual editing approaches we’ve used in the past.
> 
> 
> The Use Case
> 
> ------------------
> 
> We’re rapidly developing a large multi-tenant application (hundreds of 
> ORM models, thousands of migrations and hundreds of thousands of lines 
> of code so far, with quite a bit of work remaining) punctuated by 
> periodic production releases. We create a source code branch from our 
> mainline development trunk for each production release, just in case we 
> must rapidly issue patches to those production releases. On rare 
> occasions, we’ve had to make a schema change (such as adding a new 
> field) as a patch to a production release, and make a parallel schema 
> change in the mainline development trunk. 
> 
> 
> Of course, this normally causes a migration failure when migrating a 
> production tenant from the patch release up to a later version of the 
> mainline release – since the mainline release has a subsequent 
> migration that adds the same field. We’ve solved this in the past by 
> manually rearranging the dependency order of the mainline trunk 
> migrations (moving the replacement step before other new migrations for 
> this later release), and fiddling with the contents of the 
> django_migrations table to make it look like that mainline step has 
> already been run before running the migrations. We’re unhappy with that 
> approach – it’s both time consuming and error prone.
> 
> 
> This problem is similar to, but not identical to, that of squashing 
> migrations.
> 
> 
> (And yes, we do periodically squash our migrations. We have about 600 
> migration steps at the moment, left over from more than 2,000 
> originally created. We’ve got another round of squashing coming up soon 
> that should take us to less than 100 migrations – but we have more than 
> a dozen developers adding more migrations every week.)
> 
> 
> The Discovery
> 
> -------------------
> 
> Through trial and error, we found that our mainline migration step may 
> declare itself as a replacement for the patch step (using the 
> “replaces” attribute) – even if the patch migration itself doesn’t 
> exist in the list of mainline migrations. 
> 
> 
> And if we do this, the migration engine simply works as hoped and our 
> problem vanishes. It’s absolutely wonderful; simple to implement and 
> effective. We love it. New tenants run only the replacement step; 
> tenants migrating from the patch release to the trunk release merely 
> record the replacement step as having been completed without actually 
> executing it; development tenants that never saw the original patch 
> step simply record both the patch step and the replacement as having 
> been completed. It’s great.
> 
> 
> The Worry
> 
> --------------
> 
> This approach seems undocumented in three different ways:
> 
> 
> * The replacement migration is pointing at an original migration that 
> doesn’t exist in the trunk’s migration files. (We created it in the 
> patch branch and we know the migration name from that branch, but we 
> never added the patch migration to the mainline trunk.) The current 
> documentation[1] describes keeping both the original and the 
> replacement in place until all databases have migrated past the 
> replacement step (and then deleting the original and removing the 
> “replaces” attribute from the replacement). The documentation implies, 
> but does not explicitly state, that the original step should exist in 
> the list. Our testing shows that the original need not exist (and we 
> like it this way!).
> 
> * If we go ahead and add a copy of the patch release’s migration step 
> to the mainline trunk, we introduce a “multiple leaf nodes” graph, 
> since none of the mainline migrations depend upon this “side patch”. 
> However, apparently because there is a declared replacement for this 
> patch step, the migration engine doesn’t raise the “multiple leaf 
> nodes” exception. This seems to be an oversight unless the replacement 
> step is somehow acting as a merge (as if it had a dependency on the 
> patch step) … but we like the way it’s working now, if it were to 
> become necessary to include the original step in the mainline migration 
> list.
> 
> * We have found that we can have multiple replacement steps all 
> claiming to replace the same original step number. (This conveniently 
> handled a case where multiple migrations were originally created in the 
> trunk, then backported as a single migration into a patch to an earlier 
> production release.) But this results in the path migration’s app and 
> name being inserted into django_migrations table more than once. These 
> duplicate entries haven’t appeared to cause a problem, but they were 
> unexpected. It seems that the app and migration name ought to be 
> “unique together” but aren’t – perhaps for performance reasons, since 
> the contents of this table are normally managed solely by the 
> migrations system.
> 
> 
> The Question
> 
> -------------------
> 
> Would the core team consider the ability to “replace” a non-existent 
> migration step to be a feature or a bug? We prefer to think of this as 
> a desirable feature, since it solves what seems to be a non-uncommon 
> use case. We haven’t seen any other documented approaches to solving 
> the problem of migrations created in parallel branches – most published 
> advice boils down to either “don’t do it”, “roll back your migrations 
> then apply the new ones”, or “good luck on manually repairing things.”
> 
> If this IS considered a bug, we certainly could add the original 
> migration from the patch release, but then we’ve added a migration “to 
> the side” of the original dependency tree introducing another leaf 
> node. We’d hate for *that* to be considered a problem in the future, 
> because the replacement step doesn’t look like it should act as a merge 
> node (it doesn’t depend upon the original, just replaces it).
> 
> 
> The third point, the insertion of duplicate records into 
> django_migrations, does smell like a defect.
> 
> If people like this “feature” and believe it should be supported, we’d 
> be happy to create a documentation PR.
> 
> 
> Barry Johnson
> 
> Epicor Software Corporation
> 
> 
> [1]: 
> https://docs.djangoproject.com/en/2.2/topics/migrations/#migration-squashing
> 
> 
> -- 
> You received this message because you are subscribed to the Google 
> Groups "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send 
> an email to django-developers+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/django-developers/29136C68-DA75-431E-8C77-169097346AD1%40epicor.com
>  
> <https://groups.google.com/d/msgid/django-developers/29136C68-DA75-431E-8C77-169097346AD1%40epicor.com?utm_medium=email&utm_source=footer>.

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/1cd107b9-6a4f-4010-82c8-e5d40a161240%40www.fastmail.com.

Reply via email to