Hi,

print(qs.query) and share.

On Wednesday, January 13, 2021 at 11:25:06 PM UTC+5:30 [email protected] 
wrote:

> Hi all,
>
> I wanted to cross post my question / problem in regards to Django's ORM 
> `annotate` performance. Not sure if I should post it here on or Django 
> developers mailing list, but I wanted to start here.
>
>
> https://stackoverflow.com/questions/65506731/django-orm-annotate-performance
>
> ---
>
> I'm using Django and Django REST Framework at work and we've been having 
> some performance issues with couple endpoints lately. We started by making 
> sure that the SQL part is optimized, no unnecessary N+1 queries, indexes 
> where possible, etc.
>
> Looking at the database part itself, it seems to be very fast (3 SQL 
> queries total, under a second), even with larger datasets, but the API 
> endpoint still took >5 seconds to return. I started profiling the Python 
> code using couple different tools and the majority of time is always spent 
> inside the `annotate` and `set_group_by` functions in Django.
>
> [image: le0oG.png]
>
> I tried Googling about `annotate` and performance, looking at Django docs, 
> but there's no mention of it being a 'costly' operation, especially when 
> used with the `F` function.
>
> The `annotate` part of the code looks something like this:
>
>     qs = qs.annotate(
>         foo_name=models.F("foo__core__name"),
>         foo_birth_date=models.F("foo__core__birth_date"),
>         bar_name=models.F("bar__core__name"),
>         spam_id=models.F("baz__spam_id"),
>         spam_name=models.F("baz__spam__core__name"),
>         spam_start_date=models.F("baz__spam__core__start_date"),
>         eggs_id=models.F("baz__spam__core___eggs_id"),
>         eggs_name=models.F("baz__spam__eggs__core___name"),
>     )
>
>     qs = (
>         qs.order_by("foo_id", "eggs_id", "-spam_start_date", "bar_name")
>         .values(
>             "foo_name",
>             "foo_birth_date",
>             "bar_name",
>             "spam_id",
>             "spam_name",
>             "eggs_id",
>             "eggs_name",
>         )
>         .distinct()
>     )
>
> The query is quite big, spans multiple relationships, so I was sure that 
> the problem is database related, but it doesn't seem to be. All the 
> `select_related` and `prefetch_related` are there, indexes too.
>
> I tried rewriting the code without `annotate` at all, but it didn't seem 
> to help. I started wondering wether the time spent in `annotate` is really 
> a red herring and it's only how the profiler sees it, but all profilers I 
> tried showed the same thing.
>
> While I feel like I know Django quite well and had success optimising API 
> endpoints before, I'm not sure what 'thread' to pull in this case. I tried 
> looking at Django internals, especially around `annotate` and 
> `set_group_by` but couldn't pin point the time spent there. My last ditch 
> effort will be trying to rewrite those couple endpoints with raw SQL, but 
> I'd very much like to avoid that.
>
> All help will be much appreciated : )
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-users/34d7939c-f7de-4793-bff3-cf46bb5fe68en%40googlegroups.com.

Reply via email to