#29568: Avoid trying to UPDATE a parent model when its child just got INSERT
-------------------------------------+-------------------------------------
     Reporter:  François Dupayrat    |                    Owner:  nobody
         Type:                       |                   Status:  new
  Cleanup/optimization               |
    Component:  Database layer       |                  Version:  master
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:                       |             Triage Stage:  Accepted
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------

Old description:

> Hello,
>
> While sorting through queries, I noticed something unusual: let's say you
> have non-abstract models inheritance:
> {{{
> class Parent(models.Model):
>   parent_field = models.BooleanField()
>
> class DirectChild(Parent):
>   direct_child_field = models.BooleanField()
>
> class IndirectChild(DirectChild):
>   indirect_child_field = models.BooleanField()
> }}}
>
> When trying to create IndirectChild, I would expect 3 queries to be done:
> * INSERT Parent
> * INSERT DirectChild
> * INSERT IndirectChild
>
> Instead, since they all share the same PK, 5 queries are done:
> * INSERT Parent
> * UPDATE DirectChild (since it has a PK)
> * INSERT DirectChild (because previous UPDATE failed)
> * UPDATE IndirectChild (since it has a PK)
> * INSERT IndirectChild (because previous UPDATE failed)
>
> I found a fix that worked for me by modifying _save_parents and save_base
> in django/db/models/base.py: _save_parents return if something was
> inserted (default to fAlse if there is no parent), and force_insert if
> the parent was inserted in both methods (because if there parent was
> inserted, the child can't possibly exist).
>
> Being a beginner, I'm really wary of breaking something. Could someone if
> it's something wanted by Django and check if there is obvious mistake?
> Here is the modified method (without irrelevant code, materialized as
> [...]):
> {{{
>     def _save_parents([...]):
>         [...]
>         inserted = False
>         for parent, field in meta.parents.items():
>             [...]
>             parent_inserted = self._save_parents(cls=parent, using=using,
> update_fields=update_fields)
>             updated = self._save_table(cls=parent, using=using,
> update_fields=update_fields, force_insert=parent_inserted)
>             if not updated:
>                 inserted = True
>             [...]
>         return inserted
>
>     def save_base([...]):
>         [...]
>         with transaction.atomic(using=using, savepoint=False):
>             parent_inserted = False
>             if not raw:
>                 parent_inserted = self._save_parents(cls, using,
> update_fields)
>             updated = self._save_table(raw, cls, force_insert or
> parent_inserted, force_update, using, update_fields)
>         [...]
> }}}

New description:

 Hello,

 While sorting through queries, I noticed something unusual: let's say you
 have non-abstract models inheritance:
 {{{
 class Parent(models.Model):
   parent_field = models.BooleanField()

 class DirectChild(Parent):
   direct_child_field = models.BooleanField()

 class IndirectChild(DirectChild):
   indirect_child_field = models.BooleanField()
 }}}

 When trying to create IndirectChild, I would expect 3 queries to be done:
 * INSERT Parent
 * INSERT DirectChild
 * INSERT IndirectChild

 Instead, since they all share the same PK, when trying to save a
 IndirectChild not yet persisted to DB, 5 queries are done:
 * INSERT Parent
 * UPDATE DirectChild (since it has a PK)
 * INSERT DirectChild (because previous UPDATE failed)
 * UPDATE IndirectChild (since it has a PK)
 * INSERT IndirectChild (because previous UPDATE failed)

 Note: when trying to use IndirectChild.objects.create, only 4 queries are
 made, since QuerySet.create use force_insert=True (thanks to Tim Graham)

 I found a fix that worked for me by modifying _save_parents and save_base
 in django/db/models/base.py: _save_parents return if something was
 inserted (default to fAlse if there is no parent), and force_insert if the
 parent was inserted in both methods (because if there parent was inserted,
 the child can't possibly exist).

 Being a beginner, I'm really wary of breaking something. Could someone if
 it's something wanted by Django and check if there is obvious mistake?
 Here is the modified method (without irrelevant code, materialized as
 [...]):
 {{{
     def _save_parents([...]):
         [...]
         inserted = False
         for parent, field in meta.parents.items():
             [...]
             parent_inserted = self._save_parents(cls=parent, using=using,
 update_fields=update_fields)
             updated = self._save_table(cls=parent, using=using,
 update_fields=update_fields, force_insert=parent_inserted)
             if not updated:
                 inserted = True
             [...]
         return inserted

     def save_base([...]):
         [...]
         with transaction.atomic(using=using, savepoint=False):
             parent_inserted = False
             if not raw:
                 parent_inserted = self._save_parents(cls, using,
 update_fields)
             updated = self._save_table(raw, cls, force_insert or
 parent_inserted, force_update, using, update_fields)
         [...]
 }}}

--

Comment (by François Dupayrat):

 Replying to [comment:3 Claude Paroz]:
 > I guess a first experiment would be to make the change and run the test
 suite to see the outcome.
 Thanks, I'll get that done.

 Replying to [comment:4 Tim Graham]:
 > With`IndirectChild.objects.create(parent_field=True,
 direct_child_field=True, indirect_child_field=True)`, I see four queries
 at 4d48ddd8f93800a80330ec1dee7b7d4afe6ea95a:
 > {{{
 > 1. INSERT INTO "t29568_parent" ("parent_field") VALUES (true) RETURNING
 "t29568_parent"."id"
 > 2. UPDATE "t29568_directchild" SET "direct_child_field" = true WHERE
 "t29568_directchild"."parent_ptr_id" = 1
 > 3. INSERT INTO "t29568_directchild" ("parent_ptr_id",
 "direct_child_field") VALUES (1, true)
 > 4. INSERT INTO "t29568_indirectchild" ("directchild_ptr_id",
 "indirect_child_field") VALUES (1, true)
 > }}}
 > Accepting for investigation.
 Indeed, I didn't test with IndirectChild.objects.create, but with save()
 on a IndirectChild not yet saved to DB, since it's done that way by a
 library (django-tastypie), and outside my control.

 I edited the description to reflect that.

-- 
Ticket URL: <https://code.djangoproject.com/ticket/29568#comment:5>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/074.2c48aa743133dcc4fc03f9ce8988fb2e%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to