#34211: Performance Regression After Upgrade to Django 3.2
-------------------------------------+-------------------------------------
Reporter: polarmt | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Database layer | Version: 3.2
(models, ORM) |
Severity: Normal | Resolution:
Keywords: | Triage Stage:
| Unreviewed
Has patch: 0 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Description changed by polarmt:
Old description:
> == Overview
> We recently upgraded from Django 2.2 to Django 3.2. After the upgrade,
> several of our APIs slowed down significantly. For instance, our of our
> APIs increased by 50% from ~0.28 seconds to ~0.41 seconds.
>
> == Diagnosis
> To diagnose the issue, I ran Flamegraphs on the API with Django 2.2 and
> Django 3.2. The results were nearly identical except for one part:
>
> {{{
> #!div style="font-size: 80%"
> Code highlighting:
> {{{#!python
> class ForeignKeyDeferredAttribute(DeferredAttribute):
> def __set__(self, instance, value):
> if instance.__dict__.get(self.field.attname) != value and
> self.field.is_cached(instance):
> self.field.delete_cached_value(instance)
> instance.__dict__[self.field.attname] = value
> }}}
> }}}
>
> [https://github.com/django/django/blob/stable/3.2.x/django/db/models/fields/related_descriptors.py#L75
> django/db/models/fields/related_descriptors.py in Django 3.2]
>
> This code might seem harmless, but it did not scale well when considering
> how it is called. From my understanding, this scales in relation to the #
> of rows returned by the query times the # of foreign key fields in the
> table.
>
> The function below creates an object for every row:
>
> {{{
> #!div style="font-size: 80%"
> Code highlighting:
> {{{#!python
> class ModelIterable(BaseIterable):
> """Iterable that yields a model instance for each row."""
> def __iter__(self):
> queryset = self.queryset
> db = queryset.db
> compiler = queryset.query.get_compiler(using=db)
> # Execute the query. This will also fill compiler.select,
> klass_info,
> # and annotations.
> ...
> for row in compiler.results_iter(results):
> obj = model_cls.from_db(db, init_list,
> row[model_fields_start:model_fields_end])
> for rel_populator in related_populators:
> rel_populator.populate(row, obj)
> ...
> }}}
> }}}
>
> Every object created will go through the following logic:
>
> {{{
> #!div style="font-size: 80%"
> Code highlighting:
> {{{#!python
> class Model(metaclass=ModelBase):
>
> def __init__(self, *args, **kwargs):
> # Alias some things as locals to avoid repeat global lookups
> cls = self.__class__
> opts = self._meta
> _setattr = setattr
> _DEFERRED = DEFERRED
> ...
>
> if not kwargs:
> fields_iter = iter(opts.concrete_fields)
> # The ordering of the zip calls matter - zip throws
> StopIteration
> # when an iter throws it. So if the first iter throws it, the
> second
> # is *not* consumed. We rely on this, so don't change the
> order
> # without changing the logic.
> for val, field in zip(args, fields_iter):
> if val is _DEFERRED:
> continue
> _setattr(self, field.attname, val)
> ...
> }}}
> }}}
>
> If the field is a foreign key, then it will call the overrided `__set__`.
> For one API call, we had a `filter` query on a table with three foreign
> keys that returns 11k rows. This small change did not scale well with a
> query.
>
> == Verification
> We ran the same API call by disabling this overriding:
>
> {{{
> #!div style="font-size: 80%"
> Code highlighting:
> {{{#!python
> class ForeignKeyDeferredAttribute(DeferredAttribute):
> def set_fake(self, instance, value):
> if instance.__dict__.get(self.field.attname) != value and
> self.field.is_cached(instance):
> self.field.delete_cached_value(instance)
> instance.__dict__[self.field.attname] = value
> }}}
> }}}
>
> The latency of that API improved from ~0.41 seconds to ~0.31 seconds.
>
> == Thoughts
> This change does not seem scalable. Even with a moderately sized query
> and three foreign-key relationships, the query increased the API latency
> by 50%. We understand the motivation behind the change:
> [https://code.djangoproject.com/ticket/28147#no1 Ticket #28147]. We were
> wondering if there is a more performant fix to that problem.
New description:
== Overview
We recently upgraded from Django 2.2 to Django 3.2. After the upgrade,
several of our APIs slowed down significantly. For instance, our of our
APIs increased by 50% from ~0.28 seconds to ~0.41 seconds.
== Diagnosis
To diagnose the issue, I created Flamegraphs on the API with Django 2.2
and Django 3.2. The results were nearly identical except for one part:
{{{
#!div style="font-size: 80%"
Code highlighting:
{{{#!python
class ForeignKeyDeferredAttribute(DeferredAttribute):
def __set__(self, instance, value):
if instance.__dict__.get(self.field.attname) != value and
self.field.is_cached(instance):
self.field.delete_cached_value(instance)
instance.__dict__[self.field.attname] = value
}}}
}}}
[https://github.com/django/django/blob/stable/3.2.x/django/db/models/fields/related_descriptors.py#L75
django/db/models/fields/related_descriptors.py in Django 3.2]
This code might seem harmless, but it did not scale well when considering
how it is called. From my understanding, this scales in relation to the #
of rows returned by the query times the # of foreign key fields in the
table.
The function below creates an object for every row:
{{{
#!div style="font-size: 80%"
Code highlighting:
{{{#!python
class ModelIterable(BaseIterable):
"""Iterable that yields a model instance for each row."""
def __iter__(self):
queryset = self.queryset
db = queryset.db
compiler = queryset.query.get_compiler(using=db)
# Execute the query. This will also fill compiler.select,
klass_info,
# and annotations.
...
for row in compiler.results_iter(results):
obj = model_cls.from_db(db, init_list,
row[model_fields_start:model_fields_end])
for rel_populator in related_populators:
rel_populator.populate(row, obj)
...
}}}
}}}
Every object created will go through the following logic:
{{{
#!div style="font-size: 80%"
Code highlighting:
{{{#!python
class Model(metaclass=ModelBase):
def __init__(self, *args, **kwargs):
# Alias some things as locals to avoid repeat global lookups
cls = self.__class__
opts = self._meta
_setattr = setattr
_DEFERRED = DEFERRED
...
if not kwargs:
fields_iter = iter(opts.concrete_fields)
# The ordering of the zip calls matter - zip throws
StopIteration
# when an iter throws it. So if the first iter throws it, the
second
# is *not* consumed. We rely on this, so don't change the
order
# without changing the logic.
for val, field in zip(args, fields_iter):
if val is _DEFERRED:
continue
_setattr(self, field.attname, val)
...
}}}
}}}
If the field is a foreign key, then it will call the overrided `__set__`.
For one API call, we had a `filter` query on a table with three foreign
keys that returns 11k rows. This small change did not scale well with a
query.
== Verification
We ran the same API call by disabling this overriding:
{{{
#!div style="font-size: 80%"
Code highlighting:
{{{#!python
class ForeignKeyDeferredAttribute(DeferredAttribute):
def set_fake(self, instance, value):
if instance.__dict__.get(self.field.attname) != value and
self.field.is_cached(instance):
self.field.delete_cached_value(instance)
instance.__dict__[self.field.attname] = value
}}}
}}}
The latency of that API improved from ~0.41 seconds to ~0.31 seconds.
== Thoughts
This change does not seem scalable. Even with a moderately sized query and
three foreign-key relationships, the query increased the API latency by
50%. We understand the motivation behind the change:
[https://code.djangoproject.com/ticket/28147#no1 Ticket #28147]. We were
wondering if there is a more performant fix to that problem.
--
--
Ticket URL: <https://code.djangoproject.com/ticket/34211#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/010701850cc21f38-759840aa-8364-47d8-b2c2-8ca2e988e5f9-000000%40eu-central-1.amazonses.com.