#26064: Move migration operation reduction logic to their respective class.
------------------------------------------------+------------------------
               Reporter:  charettes             |          Owner:  nobody
                   Type:  Cleanup/optimization  |         Status:  new
              Component:  Migrations            |        Version:  master
               Severity:  Normal                |       Keywords:
           Triage Stage:  Unreviewed            |      Has patch:  1
    Needs documentation:  0                     |    Needs tests:  0
Patch needs improvement:  0                     |  Easy pickings:  0
                  UI/UX:  0                     |
------------------------------------------------+------------------------
 In order to avoid redundant operations during `makemigrations` and to
 support migration squashing
 
[https://github.com/django/django/blob/8715205c5c0e49b21b5bbea35d904713ee188a71/django/db/migrations/optimizer.py
 a reduction algorithm] combines and elides a linear collection of
 operations.

 In the case of `makemigrations` the algorithm is run on each generated
 `Migration.operations` individually and in the case of `squashmigration
 <app_label>` it is the ordered combination of all `Migration.operations`
 of the specified application.

 The actual reduction algorithm
 
[https://github.com/django/django/blob/8715205c5c0e49b21b5bbea35d904713ee188a71/django/db/migrations/optimizer.py#L34-L77
 hard codes all possible optimizations] which makes it harder to unit test
 and extend (see #24109).

 From my understanding these reduction methods were originally hardcoded in
 the `Optimizer` class for performance reason. However, based on
 benchmarking using a [https://gist.github.com/MarkusH/a95d9407a96fb8e63783
 migration generation script provided by Markus], it looks like this might
 be a case of premature optimization as I couldn't get a noticeable
 slowdown on both `makemigrations` (with over 1000 operations generated in
 a single migration) and `squashmigrations` (with over 100 migrations with
 around 10 operations each).

 Therefore I suggest to move operation reduction to their respective class.

 Note that the `Operation.reduce` method could even be documented in the
 future to expose a public API for third party application shipping with
 custom `Operation` providers.

 If it was made public this method could also be used to solve
 [https://groups.google.com/d/msg/django-
 developers/Ln1-IqysEwE/iL20WPT3EgAJ the data migration squashing problem
 described by Shai on the mailing list] by allowing specific reduction to
 be performed.

 For example, given the following operation:

 {{{#!python
 class CreateInitialCountries(migrations.RunPython):
     def __init__(model_name, name_field):
         self.model_name = model_name
         self.name_field = name_field
         super().__init__(self.forwards)

     def forwards(self, apps, schema_editor):
         country_model = apps.get(self.model_name)
         for name in self.get_initial_countries_from_csv():
             country_model.objects.create(**{self.name_field: name})

     def reduce(self, operation, in_between):
         if (isinstance(operation, RenameModel) and
                 self.model_name == operation.old_model_name):
             return [
                 operation, CreateInitialCountries(
                     operation.new_model_name, self.name_field
                 )
             ]
         elif (isinstance(operation, RenameField) and
               self.model_name == other.model_name and
               self.field_name == other.old_name):
             return [
                 operation, CreateInitialCountries(
                     self.model_name, other.new_name
                 )
             ]
         elif (isinstance(operation, DeleteModel) and
               self.model_name == operation.model_name):
             return [operation]
 }}}

 These list of operations would me reduced/optimized as follow:

 {{{#!python
 operations = [
     CreateModel(
         'Country',
         fields=[
             ('id', models.AutoField(primary_key=True)),
             ('name', models.CharField(max_length=50)),
         ]
     ),
     CreateInitialCountries('Country', 'name'),
     RenameField('Country', 'name', 'renamed_name'),
     RenameModel('Country', 'RenamedCountry'),
 ]
 assert optimize(operations) == [
     CreateModel(
         'RenamedCountry',
         fields=[
             ('id', models.AutoField(primary_key=True)),
             ('renamed', models.CharField(max_length=50)),
         ]
     ),
     CreateInitialCountries('RenamedCountry', 'renamed'),
 ]

 operations = [
     CreateModel(
         'Country',
         fields=[
             ('id', models.AutoField(primary_key=True)),
             ('name', models.CharField(max_length=50)),
         ]
     ),
     CreateInitialCountries('Country', 'name'),
     DeleteModel('Country'),
 ]
 assert optimize(operations) == []
 }}}

--
Ticket URL: <https://code.djangoproject.com/ticket/26064>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/052.12583cf299bd2044fcce9449e11e9fcf%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to