#29568: Avoid trying to UPDATE a parent model when its child just got INSERT -------------------------------------+------------------------------------- Reporter: François Dupayrat | Owner: nobody Type: | Status: new Cleanup/optimization | Component: Database layer | Version: master (models, ORM) | Severity: Normal | Resolution: Keywords: | Triage Stage: Accepted Has patch: 0 | Needs documentation: 0 Needs tests: 0 | Patch needs improvement: 0 Easy pickings: 0 | UI/UX: 0 -------------------------------------+-------------------------------------
Old description: > Hello, > > While sorting through queries, I noticed something unusual: let's say you > have non-abstract models inheritance: > {{{ > class Parent(models.Model): > parent_field = models.BooleanField() > > class DirectChild(Parent): > direct_child_field = models.BooleanField() > > class IndirectChild(DirectChild): > indirect_child_field = models.BooleanField() > }}} > > When trying to create IndirectChild, I would expect 3 queries to be done: > * INSERT Parent > * INSERT DirectChild > * INSERT IndirectChild > > Instead, since they all share the same PK, 5 queries are done: > * INSERT Parent > * UPDATE DirectChild (since it has a PK) > * INSERT DirectChild (because previous UPDATE failed) > * UPDATE IndirectChild (since it has a PK) > * INSERT IndirectChild (because previous UPDATE failed) > > I found a fix that worked for me by modifying _save_parents and save_base > in django/db/models/base.py: _save_parents return if something was > inserted (default to fAlse if there is no parent), and force_insert if > the parent was inserted in both methods (because if there parent was > inserted, the child can't possibly exist). > > Being a beginner, I'm really wary of breaking something. Could someone if > it's something wanted by Django and check if there is obvious mistake? > Here is the modified method (without irrelevant code, materialized as > [...]): > {{{ > def _save_parents([...]): > [...] > inserted = False > for parent, field in meta.parents.items(): > [...] > parent_inserted = self._save_parents(cls=parent, using=using, > update_fields=update_fields) > updated = self._save_table(cls=parent, using=using, > update_fields=update_fields, force_insert=parent_inserted) > if not updated: > inserted = True > [...] > return inserted > > def save_base([...]): > [...] > with transaction.atomic(using=using, savepoint=False): > parent_inserted = False > if not raw: > parent_inserted = self._save_parents(cls, using, > update_fields) > updated = self._save_table(raw, cls, force_insert or > parent_inserted, force_update, using, update_fields) > [...] > }}} New description: Hello, While sorting through queries, I noticed something unusual: let's say you have non-abstract models inheritance: {{{ class Parent(models.Model): parent_field = models.BooleanField() class DirectChild(Parent): direct_child_field = models.BooleanField() class IndirectChild(DirectChild): indirect_child_field = models.BooleanField() }}} When trying to create IndirectChild, I would expect 3 queries to be done: * INSERT Parent * INSERT DirectChild * INSERT IndirectChild Instead, since they all share the same PK, when trying to save a IndirectChild not yet persisted to DB, 5 queries are done: * INSERT Parent * UPDATE DirectChild (since it has a PK) * INSERT DirectChild (because previous UPDATE failed) * UPDATE IndirectChild (since it has a PK) * INSERT IndirectChild (because previous UPDATE failed) Note: when trying to use IndirectChild.objects.create, only 4 queries are made, since QuerySet.create use force_insert=True (thanks to Tim Graham) I found a fix that worked for me by modifying _save_parents and save_base in django/db/models/base.py: _save_parents return if something was inserted (default to fAlse if there is no parent), and force_insert if the parent was inserted in both methods (because if there parent was inserted, the child can't possibly exist). Being a beginner, I'm really wary of breaking something. Could someone if it's something wanted by Django and check if there is obvious mistake? Here is the modified method (without irrelevant code, materialized as [...]): {{{ def _save_parents([...]): [...] inserted = False for parent, field in meta.parents.items(): [...] parent_inserted = self._save_parents(cls=parent, using=using, update_fields=update_fields) updated = self._save_table(cls=parent, using=using, update_fields=update_fields, force_insert=parent_inserted) if not updated: inserted = True [...] return inserted def save_base([...]): [...] with transaction.atomic(using=using, savepoint=False): parent_inserted = False if not raw: parent_inserted = self._save_parents(cls, using, update_fields) updated = self._save_table(raw, cls, force_insert or parent_inserted, force_update, using, update_fields) [...] }}} -- Comment (by François Dupayrat): Replying to [comment:3 Claude Paroz]: > I guess a first experiment would be to make the change and run the test suite to see the outcome. Thanks, I'll get that done. Replying to [comment:4 Tim Graham]: > With`IndirectChild.objects.create(parent_field=True, direct_child_field=True, indirect_child_field=True)`, I see four queries at 4d48ddd8f93800a80330ec1dee7b7d4afe6ea95a: > {{{ > 1. INSERT INTO "t29568_parent" ("parent_field") VALUES (true) RETURNING "t29568_parent"."id" > 2. UPDATE "t29568_directchild" SET "direct_child_field" = true WHERE "t29568_directchild"."parent_ptr_id" = 1 > 3. INSERT INTO "t29568_directchild" ("parent_ptr_id", "direct_child_field") VALUES (1, true) > 4. INSERT INTO "t29568_indirectchild" ("directchild_ptr_id", "indirect_child_field") VALUES (1, true) > }}} > Accepting for investigation. Indeed, I didn't test with IndirectChild.objects.create, but with save() on a IndirectChild not yet saved to DB, since it's done that way by a library (django-tastypie), and outside my control. I edited the description to reflect that. -- Ticket URL: <https://code.djangoproject.com/ticket/29568#comment:5> Django <https://code.djangoproject.com/> The Web framework for perfectionists with deadlines. -- You received this message because you are subscribed to the Google Groups "Django updates" group. To unsubscribe from this group and stop receiving emails from it, send an email to django-updates+unsubscr...@googlegroups.com. To post to this group, send email to django-updates@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/django-updates/074.2c48aa743133dcc4fc03f9ce8988fb2e%40djangoproject.com. For more options, visit https://groups.google.com/d/optout.