Thanks Raphael, that's a pretty good write up!

You're essentially speaking about 2 things here, in my opinion:

1. Adding a new feature for interactive squash
2. Improving the MigrationOptimizer

I certainly see a point for 2. Not sure how much for 1. Anyway, your 
reasoning for 2 sounds great! I'd be more than happy if you want to get 
this into Django! Can you create a respective ticket and drop your 
explanation in there, please :)

Cheers,

/Markus

On Wednesday, February 15, 2017 at 1:22:11 PM UTC+1, rap...@makeleaps.com 
wrote:
>
> Hello everyone,
>
>    I spent a couple hours last night trying to improve the migration 
> squasher optimizer (migrations were taking almost 15 minutes in CI). I came 
> up with a couple ideas for anyone interested in improvements:
>  
>  1- Having an interactive mode for squashing would be interested. 
> Currently, when squashing migrations, I do the following: 
>
>    - Generate an initial squash
>    - Edit it (namely, move around operations to get more optimizations to 
>    work)
>    - remove the "replaces" tag, then rerun migration squashing to 
>    "re-optimize"
>    - repeat until I get something I like, then add the original 
>    "replaces" tag
>
>    It would be cool if instead, the process were (with a flag):
>
>    - Generate an initial squash, but have the process wait for 
>    confirmation to "commit" this squash as final (though writing out the file)
>    - Edit the file, and tell the process to try re-optimizing with the 
>    same file (getting around the "no-squash of squashes" rule)
>    - Potentially, allow us to also step back
>
>  For example, the "squashmigrations" command output could look like:
>
> generated 0001_squashed_mig.py
>> optmize migration[yN]? 
>> <user inputs y>
>> regenerated 0001_squashed_mig.py
>> ( 20 operations -> 10 operations)
>> optimize migration[Ynr]? 
>> <user inputs y>
>> regenerated 0001_squashed_mig.py
>> ( No change in operation count)
>> optimize migration[Ynr]? 
>> <user inputs r>
>> rolled back to previous version
>> optimize migration[Ynr]? 
>> <user inputs n>
>> Saved migration
>>
>
>
> A simpler version of this command would simply be to add an 
> "optimizemigration" command that just reads in a single migration and 
> optimizes the operations, without touching any of the squashiness. 
>
>
>  2- The reducer might be a bit too pessimistic
>
>  Currently, the optimizer lets "reduce" operations (that take 2 operations 
> and return 0,1, or 2 operations, or None if nothing can be change) do 
> whatever they want. Because of that, if you have [A, B, C ,D] and B depends 
> on A, you can't reduce A and C because the reduction might remove A.
>
>  In reality there are two kinds of reduction operations that we could be 
> taking into account:
>
>    - reducing "left". if you have [A, B, C, D] (for example, A is a 
> CreateModel , C is an AddField for the same model),  and you can reduce A 
> and C into just A' (A with C), giving [A', B, D], then it doesn't matter 
> that B depends on A. 
>
> The thing that matters is if C depends on B (for example, C adds a foreign 
> key to a model created in B). This is actually already encoded in the 
> CreateMode + AddField reduction, but is perhaps a more general case.
>
> In a sense, reducing A and C "to the left" means that we're bringing A and 
> C closer together only by moving C. This is a major part of the potential 
> reductions that the current optimizer is missing.
>
>    - reducing right. If you have [A, B, C, D] (for example, A is a 
> CreateModel, C is a RemoveModel for the same model), and you can reduce A 
> and C into just C' (C with A), giving [B, C', D], then it does matter that 
> B depends on A. C can't depend on B (assuming causality holds in our 
> universe)
>
> This is the current mechanism, essentially. If B depends on A, then you 
> can't move A past B. 
>
>  Removing both operations is a special case of reducing right (You can 
> make C' into a no-op).
>
> I had monkeypatched a special case of  reducing left (taking CreateModel, 
> AddField of different models and swapping them . For example 
> [CreateModel(A), CreateModel(B), AddField(A.foo)] -> [CreateModel(A), 
> AddField(A.foo), CreateModel(B)]) and got decent results, but I think 
> making the optimization code express these two concepts separately would 
> catch even more of the optimizations I saw that the optimizer didn't.
>
>
> I hope some of this is useful 
>  
>   Raphael
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/bcc73cef-e6f8-4aaf-b5a9-4592895b3b2e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to