>
> Adam/Gordon, I'm interested in hearing how these changes led you to
> discovering stale prefetches?


We removed all the manual prefetches in admin get_queryset() methods, added
the auto_prefetch_related call, and regenerated the performance records
from django-perf-rec tests that hit the admin classes on the list view,
detail view, etc. Some queries disappeared and on inspection it was obvious
they were for features since removed e.g. a related field no longer shown
on the list view.

On 15 August 2017 at 23:05, Josh Smeaton <josh.smea...@gmail.com> wrote:

> I'm in favour of *some* automated way of handling these prefetches,
> whether it's opt-in or opt-out there should be some mechanism for
> protection. Preferably with loud logging that directs users to the source
> of the automated hand-holding so they have an opportunity to disable or
> fine tune the queries. Not all Django devs are ORM/Database/Python experts
> - some are frontend devs just trying to get by. I know that this kind of
> proposed behaviour would have both saved our site from massive performance
> issues, but also likely guided these same devs to the source of potential
> issues.
>
> Agreed that prefetch_related is strictly better than not, and is
> acceptable in the absence of select_related.
>
> I think keeping the code back and focussing on the idea first is a good
> one. I think I'd like to know whether people thought that opt-in or opt-out
> behaviour would be best?
>
> For me - there are a few cases where automatically prefetching would
> *maybe* be the wrong thing to do. In the vast majority of cases it'd be
> better than the default of having nothing.
>
> A few concerns:
>
> - Be careful of `.iterator()` queries (that can't use prefetch)
> - Could we warn of reverse M2M/ForeignKey at least?
>
> Adam/Gordon, I'm interested in hearing how these changes led you to
> discovering stale prefetches?
>
> On Wednesday, 16 August 2017 07:30:10 UTC+10, Adam Johnson wrote:
>>
>> I'm biased here, Gordon was my boss for nearly three years, 2014-2016.
>>
>> I'm in favour of adding the auto-prefetching, I've seen it work. It was
>> created around midway through last year and applied to our application's
>> Admin interface, which was covered in tests with *django-perf-rec*.
>> After adding the automatic prefetch, we not only identified some existing
>> N+1 query problems that hadn't been picked up (they were there in the
>> performance record files, just there were so many queries humans had missed
>> reading some), we also found a number of stale prefetch queries that could
>> be removed because the data they fetched wasn't being used. Additionally
>> not all of the admin interface was perf-rec-tested/optimized so some pages
>> were "just faster" with no extra effort.
>>
>> I think there's a case for adding it to core - releasing as a third party
>> package would make it difficult to use since it requires changes - mostly
>> small - to QuerySet, ForeignKey, and the descriptor for ForeignKey. Users
>> of such a package would have to use subclasses of all of these, the
>> trickiest being ForeignKey since it triggers migrations... We had the
>> luxury in our codebase of already having these subclasses for other
>> customizations, so it was easier to roll out.
>>
>> On 15 August 2017 at 20:35, Gordon Wrigley <gordon....@gmail.com> wrote:
>>
>>> The warnings you propose would certainly be an improvement on the status
>>> quo.
>>> However for that to be a complete solution Django would also need to
>>> detect places where there are redundant prefetch_relateds.
>>>
>>> Additionally tools like the Admin and DRF would need to provide adequate
>>> hooks for inserting these calls.
>>> For example ModelAdmin.get_queryset is not really granular enough as
>>> it's used by both the list and detail views which might touch quite
>>> different sets of fields. (Although in practice what you generally do is
>>> optimize the list view as that's the one that tends to explode)
>>>
>>> That aside I sincerely believe that the proposed approach is superior to
>>> the current default behavior in the majority of cases and further more
>>> doesn't fail as badly as the current behavior when it's not appropriate. I
>>> expect that if implemented as an option then in time that belief would
>>> prove itself.
>>>
>>> On Tue, Aug 15, 2017 at 8:17 PM, Tom Forbes <t...@tomforb.es> wrote:
>>>
>>>> Exploding query counts are definitely a pain point in Django, anything
>>>> to improve that is definitely worth considering. They have been a problem
>>>> in all Django projects I have seen.
>>>>
>>>> However I think the correct solution is for developers to correctly add
>>>> select/prefetch calls. There is no general solution for automatically
>>>> applying them that works for enough cases, and i think adding such a method
>>>> to querysets would be used incorrectly and too often.
>>>>
>>>> Perhaps a better solution would be for Django to detect these O(n)
>>>> query cases and display intelligent warnings, with suggestions as to the
>>>> correct select/prefetch calls to add. When debug mode is enabled we could
>>>> detect repeated foreign key referencing from the same source.
>>>>
>>>> On 15 Aug 2017 19:44, "Gordon Wrigley" <gordon....@gmail.com> wrote:
>>>>
>>>> Sorry maybe I wasn't clear enough about the proposed mechanism.
>>>>
>>>> Currently when you dereference a foreign key field on an object (so
>>>> 'choice.question' in the examples above) if it doesn't have the value
>>>> cached from an earlier access, prefetch_related or select_related then
>>>> Django will automatically perform a db query to fetch it. After that the
>>>> value will then be cached on the object for any future dereferences.
>>>>
>>>> This automatic fetching is the source the N+1 query problems and in my
>>>> experience most gross performance problems in Django apps.
>>>>
>>>> The proposal essentially is to add a new queryset function that says
>>>> for the group of objects fetched by this queryset, whenever one of these
>>>> automatic foreign key queries happens on one of them instead of fetching
>>>> the foreign key for just that one use the prefetch mechanism to fetch it
>>>> for all of them.
>>>> The presumption being that the vast majority of the time when you
>>>> access a field on one object from a queryset result, probably you are going
>>>> to access the same field on many of the others as well.
>>>>
>>>> The implementation I've used in production does nest across foreign
>>>> keys so something (admittedly contrived) like:
>>>> for choice in Choice.objects.all():
>>>>     print(choice.question.author)
>>>> Will produce 3 queries, one for all choices, one for the questions of
>>>> those choices and one for the authors of those questions.
>>>>
>>>> It's worth noting that because these are foreign keys in their "to one"
>>>> direction each of those queryset results will be at most the same size (in
>>>> rows) as the proceeding one and often (due to nulls and duplicates) 
>>>> smaller.
>>>>
>>>> I do not propose touching reverse foreign key or many2many fields as
>>>> the generated queries could request substantially more rows from the DB
>>>> than the original query and it's not at all clear how this mechanism would
>>>> sanely interact with filtering etc. So this is purely about the forward
>>>> direction of foreign keys.
>>>>
>>>> I hope that clarifies my thinking some.
>>>>
>>>> Regards
>>>> G
>>>>
>>>> On Tue, Aug 15, 2017 at 7:02 PM, Marc Tamlyn <marc....@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Gordon,
>>>>>
>>>>> Thanks for the suggestion.
>>>>>
>>>>> I'm not a fan of adding a layer that tries to be this clever. How
>>>>> would possible prefetches be identified? What happens when an initial loop
>>>>> in a view requires one prefetch, but a subsequent loop in a template
>>>>> requires some other prefetch? What about nested loops resulting in nested
>>>>> prefetches? Code like this is almost guaranteed to break unexpectedly in
>>>>> multiple ways. Personally, I would argue that correctly setting up and
>>>>> maintaining appropriate prefetches and selects is a necessary part of
>>>>> working with an ORM.
>>>>>
>>>>> Do you know of any other ORMs which attempt similar magical
>>>>> optimisations? How do they go about identifying the cases where it is
>>>>> necessary?
>>>>>
>>>>> On 15 August 2017 at 10:44, Gordon Wrigley <gordon....@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> I'd like to discuss automatic prefetching in querysets. Specifically
>>>>>> automatically doing prefetch_related where needed without the user having
>>>>>> to request it.
>>>>>>
>>>>>> For context consider these three snippets using the Question & Choice
>>>>>> models from the tutorial
>>>>>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models>
>>>>>>  when
>>>>>> there are 100 questions each with 5 choices for a total of 500 choices.
>>>>>>
>>>>>> Default
>>>>>> for choice in Choice.objects.all():
>>>>>>     print(choice.question.question_text, ':', choice.choice_text)
>>>>>> 501 db queries, fetches 500 choice rows and 500 question rows from
>>>>>> the DB
>>>>>>
>>>>>> Prefetch_related
>>>>>> for choice in Choice.objects.prefetch_related('question'):
>>>>>>     print(choice.question.question_text, ':', choice.choice_text)
>>>>>> 2 db queries, fetches 500 choice rows and 100 question rows from the
>>>>>> DB
>>>>>>
>>>>>> Select_related
>>>>>> for choice in Choice.objects.select_related('question'):
>>>>>>     print(choice.question.question_text, ':', choice.choice_text)
>>>>>> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>>>>>>
>>>>>> I've included select_related for completeness, I'm not going to
>>>>>> propose changing anything about it's use. There are places where it is 
>>>>>> the
>>>>>> best choice and in those places it will still be up to the user to 
>>>>>> request
>>>>>> it. I will note that anywhere select_related is optimal prefetch_related 
>>>>>> is
>>>>>> still better than the default and leave it at that.
>>>>>>
>>>>>> The 'Default' example above is a classic example of the N+1 query
>>>>>> problem, a problem that is widespread in Django apps.
>>>>>> This pattern of queries is what new users produce because they don't
>>>>>> know enough about the database and / or ORM to do otherwise.
>>>>>> Experieced users will also often produce this because it's not always
>>>>>> obvious what fields will and won't be used and subsequently what should 
>>>>>> be
>>>>>> prefetched.
>>>>>> Additionally that list will change over time. A small change to a
>>>>>> template to display an extra field can result in a denial of service on
>>>>>> your DB due to a missing prefetch.
>>>>>> Identifying missing prefetches is fiddly, time consuming and error
>>>>>> prone. Tools like django-perf-rec
>>>>>> <https://github.com/YPlan/django-perf-rec> (which I was involved in
>>>>>> creating) and nplusone <https://github.com/jmcarp/nplusone> exist in
>>>>>> part to flag missing prefetches introduced by changed code.
>>>>>> Finally libraries like Django Rest Framework and the Admin will also
>>>>>> produce queries like this because it's very difficult for them to know 
>>>>>> what
>>>>>> needs prefetching without being explicitly told by an experienced user.
>>>>>>
>>>>>> As hinted at the top I'd like to propose changing Django so the
>>>>>> default code behaves like the prefetch_related code.
>>>>>> Longer term I think this should be the default behaviour but
>>>>>> obviously it needs to be proved first so for now I'd suggest a new 
>>>>>> queryset
>>>>>> function that enables this behaviour.
>>>>>>
>>>>>> I have a proof of concept of this mechanism that I've used
>>>>>> successfully in production. I'm not posting it yet because I'd like to
>>>>>> focus on desired behavior rather than implementation details. But in
>>>>>> summary, what it does is when accessing a missing field on a model, 
>>>>>> rather
>>>>>> than fetching it just for that instance, it runs a prefetch_related query
>>>>>> to fetch it for all peer instances that were fetched in the same 
>>>>>> queryset.
>>>>>> So in the example above it prefetches all Questions in one query.
>>>>>>
>>>>>> This might seem like a risky thing to do but I'd argue that it really
>>>>>> isn't.
>>>>>> The only time this isn't superior to the default case is when you are
>>>>>> post filtering the queryset results in Python.
>>>>>> Even in that case it's only inferior if you started with a large
>>>>>> number of results, filtered basically all of them and the code is
>>>>>> structured so that the filtered ones aren't garbage collected.
>>>>>> To cover this rare case the automatic prefetching can easily be
>>>>>> disabled on a per queryset or per object basis. Leaving us with a rare
>>>>>> downside that can easily be manually resolved in exchange for a 
>>>>>> significant
>>>>>> general improvement.
>>>>>>
>>>>>> In practice this thing is almost magical to work with. Unless you
>>>>>> already have extensive and tightly maintained prefetches everywhere you 
>>>>>> get
>>>>>> an immediate boost to virtually everything that touches the database, 
>>>>>> often
>>>>>> knocking orders of magnitude off page load times.
>>>>>>
>>>>>> If an agreement can be reached on pursuing this then I'm happy to put
>>>>>> in the work to productize the proof of concept.
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Django developers (Contributions to Django itself)" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to django-develop...@googlegroups.com.
>>>>>> To post to this group, send email to django-d...@googlegroups.com.
>>>>>> Visit this group at https://groups.google.com/group/django-developers
>>>>>> .
>>>>>> To view this discussion on the web visit
>>>>>> https://groups.google.com/d/msgid/django-developers/d402bf30
>>>>>> -a5af-4072-8b50-85e921f7f9af%40googlegroups.com
>>>>>> <https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>>> .
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to a topic in the
>>>>> Google Groups "Django developers (Contributions to Django itself)" group.
>>>>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>>>>> pic/django-developers/EplZGj-ejvg/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>> django-develop...@googlegroups.com.
>>>>>
>>>>> To post to this group, send email to django-d...@googlegroups.com.
>>>>> Visit this group at https://groups.google.com/group/django-developers.
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/django-developers/CAMwjO1G
>>>>> aha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com
>>>>> <https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Django developers (Contributions to Django itself)" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to django-develop...@googlegroups.com.
>>>> To post to this group, send email to django-d...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/django-developers.
>>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>>> gid/django-developers/CAD-wiX3Xa%3DN-D95RPGo8%3D3kN0zunuAOw-
>>>> SpYUa4g_zsk63bARQ%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/django-developers/CAD-wiX3Xa%3DN-D95RPGo8%3D3kN0zunuAOw-SpYUa4g_zsk63bARQ%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to a topic in the
>>>> Google Groups "Django developers (Contributions to Django itself)" group.
>>>> To unsubscribe from this topic, visit https://groups.google.com/d/to
>>>> pic/django-developers/EplZGj-ejvg/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to
>>>> django-develop...@googlegroups.com.
>>>> To post to this group, send email to django-d...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/django-developers.
>>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>>> gid/django-developers/CAFNZOJO5LEf_i%2BqG2KFUOrbTXG-yanubzjF
>>>> vC1mqU-B0GGG9ng%40mail.gmail.com
>>>> <https://groups.google.com/d/msgid/django-developers/CAFNZOJO5LEf_i%2BqG2KFUOrbTXG-yanubzjFvC1mqU-B0GGG9ng%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Django developers (Contributions to Django itself)" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to django-develop...@googlegroups.com.
>>> To post to this group, send email to django-d...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/django-developers.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/django-developers/CAD-wiX22Fn_qyvEcnLHEsPoKyvxGsrLXiGXvP
>>> %3Dz5%2BoX9W-NnNg%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/django-developers/CAD-wiX22Fn_qyvEcnLHEsPoKyvxGsrLXiGXvP%3Dz5%2BoX9W-NnNg%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Adam
>>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/django-developers/3fbababd-0324-4b14-a40e-
> 83f72c4f945c%40googlegroups.com
> <https://groups.google.com/d/msgid/django-developers/3fbababd-0324-4b14-a40e-83f72c4f945c%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Adam

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMyDDM1WG58bRJ%3Dt5hBTZzwRqa_ABXfoMO_TP3DyyUomz8b7cA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to