As an opt in feature, it does sound quite interesting. If over the course
of a few releases, it proves to be reliable and we don't get tons of
support requests of it either not working, or causing the opposite problem,
we can consider moving it to an opt out feature. This makes it easy for
people to "turn on the magic", but keeps the default safer. It would be
very easy to write a one line monkeypatch to make it default behaviour.

Django is conservative when it comes to changes to default behaviour, and
doubly so when the ORM is considered. This will need to have been used
widely in the wild before it would be considered safe as a default
behaviour for all projects. This whole process would only take a couple of
years, which isn't much time in Django release land! I don't mean to
undermine the belief of the people who think it's definitely an improvement
in most cases, I just want us to be cautious in how we explore that
hypothesis.

It's also likely to require a significant patch, and need careful analysis
before merging. I'd suggest it's a good candidate for a DEP to discuss the
motivation for the project, and to find a shepherd from the core team
(Josh? Adam?) to help it land.

On 16 August 2017 at 13:48, Luke Plant <l.plant...@cantab.net> wrote:

> Hi Josh,
>
> On 16/08/17 02:26, Josh Smeaton wrote:
>
> I believe we should be optimising for the **common** use case, not
> expecting everyone to be experts with the ORM.
>
> > If I can come up with a single example where it would significantly
> decrease performance (either memory usage or speed) compared to the default
> (and I'm sure I can), then I would be strongly opposed to it ever being
> default behaviour.
>
> The status quo is already one where thousands of users and sites are doing
> the non-optimal thing because we're choosing to be conservative and have
> users opt-in to the optimal behaviour.
>
>
> I wouldn't say it is "optimal behaviour". It is behaviour that is an
> optimization for some use cases - common ones, I'd agree - but not all. In
> fact it is not even the best optimization - it performs less well in many
> cases than a manual `select_related`.
>
> We don't know how many sites would be affected by the opposite default,
> because we aren't putting people in that situation. Generating potentially
> large queries (i.e. ones that return lots of results) is going to make
> someone's life a lot harder, and can even crash processes (out of memory
> errors), or cause massive slowdown due to insufficient memory and swapping.
> I have had these kind of issues in production systems, and sometimes the
> answer is to prefetch *less* - which is why things like `iterator()` exist,
> because sometimes you *don't* want to load it all into memory upfront.
>
> A massive complaint against Django is how easy it is for users to build in
> 1+N behaviour. Django is supposed to abstract the database away (so much so
> that we avoid SQL related terms in our queryset methods), yet one of the
> more fundamental concepts such as joins we expect users to know about and
> optimise for.
>
>
> This is far from the only place where we expect users to be conscious of
> what has to go on at the database level. For example, we expect users to
> use `.count()` and `.exists()` where appropriate, and not use them where
> not appropriate - see https://docs.djangoproject.com/en/1.11/topics/db/
> optimization/#don-t-overuse-count-and-exists
>
> This is an example of doing it right. We could have 'intelligently' made
> `__len__()` do `count()`, but this introduces queries that the user did not
> ask for, and that's hard for developers to predict. That whole section of
> the docs on DB optimisation depends on the possibility of understanding
> things at a DB level, and understanding how QuerySets behave, and that they
> only do the queries that you ask them to do. The more magic we build in,
> the harder we make life for people trying to optimize database access.
>
> If we have an `auto_prefetch` method that has to be called explicitly,
> then we are allowing users who know less about databases to get something
> that works OK for many situations. But having it on by default makes
> optimization harder and adds unwelcome surprises.
>
>
> I'd be more in favour of throwing an error on 
> non-select-related-foreign-key-access
> than what we're currently doing which is a query for each access.
>
>
> I have some sympathy with this, but I doubt it is somewhere we can go from
> the current ORM and Django ecosystem as a default - but see below.
>
>
> The only options users currently have of monitoring poor behaviour is:
>
> 1. Add logging to django.db.models
> 2. Add django-debug-toolbar
> 3. Investigate page slow downs
>
>
> Here's a bunch of ways that previously tuned queries can "go bad":
>
> 1. A models `__str__` method is updated to include a related field
> 2. A template uses a previously unused related field
> 3. A report uses a previously unused related field
> 4. A ModelAdmin adds a previously unused related field
>
>
> I completely agree that visibility of this problem is a major issue, and
> would really welcome work on improving this, especially in DEBUG mode. One
> option might be a method that replaces lazy loads with exceptions. This
> would force you to add the required `prefetch_related` or `select_related`
> calls. You could have:
>
> .lazy_load(None)   # exception on accessing any non-loaded FK objects
>
> .lazy_load('my_fk1', 'my_fk2')  # exceptions on accessing any unloaded FK
> objects apart from the named ones
>
> .lazy_load('__any__')   # cancel the above, restore default behaviour
>
> This (or something like it) would be a different way to tackle the problem
> - just throwing it out. You could have a Meta option to do
> `.lazy_load(None)` by default for a Model.  I've no idea how practical it
> would be to wire this all up so that is works correctly, and with nested
> objects etc.
>
> This would be a bit of a departure from the current ORM, especially if we
> wanted to migrate to it being the default behaviour. If we are thinking
> long term, we might have to move to something like this if the future is
> async. Lazy-loading ORMs are not very friendly to async code, but an ORM
> where you must specify a set of queries to execute up-front might be more
> of a possibility.
>
>
> I think a better question to ask is:
>
> - How many people have had their day/site ruined because we think
> auto-prefetching is too magical?
> - How many people would have their day/site ruined because we think
> auto-prefetching is the better default?
>
>
> Answer - we could guess, but we don't know, and we still have problems
> both way, of different frequencies and size. In this case, I think
> "explicit is better than implicit" is the wisdom to fall back on. We should
> also fall back to the experience we've had with this in the past. We did
> have more magical performance fixes in the past, such as the `depth`
> argument to `select_related`, removed in 1.7. You now have to explicitly
> name the things you want to be selected, instead of the 'easy' option which
> did the wrong thing in some cases.
>
>
>
> If we were introducing a new ORM, I think the above answer would be
> obvious given what we know of Django use today.
>
> What I'd propose:
>
> 1. (optional) A global setting to disable autoprefetching
>
>
> This would almost surely interact very badly with 3rd party libraries and
> apps who will all want you to set it differently.
>
> 2. An opt out per queryset
> 3. (optional) An opt out per Meta?
> 4. Logging any autoprefetches - perhaps as a warning?
>
>
> Logging of possible improvements or auto-prefetches could be really
> helpful. We should be careful to namespace the logging so that it can be
> silenced easily. The biggest issue with logs/warnings is that the unhelpful
> reports drown out the helpful ones, and then the whole feature becomes
> useless or worse.
>
> Regards,
>
> Luke
>
>
> More experienced Django users that do not want this behaviour are going to
> know about a global setting and can opt in to the old behaviour rather
> easily. Newer users that do not know about select/prefetch_related or these
> settings will fall into the new behaviour by default.
>
> It's unreasonable to expect every user of django learn the ins and outs of
> all queryset methods. I'm probably considered a django orm expert, and I
> still sometimes write queries that are non-optimal or *become* non-optimal
> after changes in unrelated areas. At an absolute minimum we should be
> screaming and shouting when this happens. But we can also fix the issue
> while complaining, and help guide users into correct behaviour.
>
>
> On Wednesday, 16 August 2017 08:41:31 UTC+10, Anthony King wrote:
>>
>> Automatically prefetching is something I feel should be avoided.
>>
>> A common gripe I have with ORMs is they hide what's actually happening
>> with the database, resulting in beginners-going-on-intermediates
>> building libraries/systems that don't scale well.
>>
>> We have several views in a dashboard, where a relation may be accessed
>> once or twice while iterating over a large python filtered queryset.
>> Prefetching this relation based on the original queryset has the
>> potential to add around 5 seconds to the response time (probably more, that
>> table has doubled in size since I last measured it).
>>
>> I feel it would be better to optimise for your usecase, as apposed to try
>> to prevent uncalled-for behaviour.
>>
>>
>>
>> On Aug 15, 2017 23:15, "Luke Plant" <l.pla...@cantab.net> wrote:
>>
>>> I agree with Marc here that the proposed optimizations are 'magical'. I
>>> think when it comes to optimizations like these you simply cannot know in
>>> advance whether doing extra queries is going to a be an optimization or a
>>> pessimization. If I can come up with a single example where it would
>>> significantly decrease performance (either memory usage or speed) compared
>>> to the default (and I'm sure I can), then I would be strongly opposed to it
>>> ever being default behaviour.
>>>
>>> Concerning implementing it as an additional  QuerySet method like
>>> `auto_prefetch()` - I'm not sure what I think, I feel like it could get
>>> icky (i.e. increase our technical debt), due to the way it couples things
>>> together. I can't imagine ever wanting to use it, though, I would always
>>> prefer the manual option.
>>>
>>> Luke
>>>
>>>
>>>
>>> On 15/08/17 21:02, Marc Tamlyn wrote:
>>>
>>> Hi Gordon,
>>>
>>> Thanks for the suggestion.
>>>
>>> I'm not a fan of adding a layer that tries to be this clever. How would
>>> possible prefetches be identified? What happens when an initial loop in a
>>> view requires one prefetch, but a subsequent loop in a template requires
>>> some other prefetch? What about nested loops resulting in nested
>>> prefetches? Code like this is almost guaranteed to break unexpectedly in
>>> multiple ways. Personally, I would argue that correctly setting up and
>>> maintaining appropriate prefetches and selects is a necessary part of
>>> working with an ORM.
>>>
>>> Do you know of any other ORMs which attempt similar magical
>>> optimisations? How do they go about identifying the cases where it is
>>> necessary?
>>>
>>> On 15 August 2017 at 10:44, Gordon Wrigley <gordon....@gmail.com> wrote:
>>>
>>>> I'd like to discuss automatic prefetching in querysets. Specifically
>>>> automatically doing prefetch_related where needed without the user having
>>>> to request it.
>>>>
>>>> For context consider these three snippets using the Question & Choice
>>>> models from the tutorial
>>>> <https://docs.djangoproject.com/en/1.11/intro/tutorial02/#creating-models> 
>>>> when
>>>> there are 100 questions each with 5 choices for a total of 500 choices.
>>>>
>>>> Default
>>>> for choice in Choice.objects.all():
>>>>     print(choice.question.question_text, ':', choice.choice_text)
>>>> 501 db queries, fetches 500 choice rows and 500 question rows from the
>>>> DB
>>>>
>>>> Prefetch_related
>>>> for choice in Choice.objects.prefetch_related('question'):
>>>>     print(choice.question.question_text, ':', choice.choice_text)
>>>> 2 db queries, fetches 500 choice rows and 100 question rows from the DB
>>>>
>>>> Select_related
>>>> for choice in Choice.objects.select_related('question'):
>>>>     print(choice.question.question_text, ':', choice.choice_text)
>>>> 1 db query, fetches 500 choice rows and 500 question rows from the DB
>>>>
>>>> I've included select_related for completeness, I'm not going to propose
>>>> changing anything about it's use. There are places where it is the best
>>>> choice and in those places it will still be up to the user to request it. I
>>>> will note that anywhere select_related is optimal prefetch_related is still
>>>> better than the default and leave it at that.
>>>>
>>>> The 'Default' example above is a classic example of the N+1 query
>>>> problem, a problem that is widespread in Django apps.
>>>> This pattern of queries is what new users produce because they don't
>>>> know enough about the database and / or ORM to do otherwise.
>>>> Experieced users will also often produce this because it's not always
>>>> obvious what fields will and won't be used and subsequently what should be
>>>> prefetched.
>>>> Additionally that list will change over time. A small change to a
>>>> template to display an extra field can result in a denial of service on
>>>> your DB due to a missing prefetch.
>>>> Identifying missing prefetches is fiddly, time consuming and error
>>>> prone. Tools like django-perf-rec
>>>> <https://github.com/YPlan/django-perf-rec> (which I was involved in
>>>> creating) and nplusone <https://github.com/jmcarp/nplusone> exist in
>>>> part to flag missing prefetches introduced by changed code.
>>>> Finally libraries like Django Rest Framework and the Admin will also
>>>> produce queries like this because it's very difficult for them to know what
>>>> needs prefetching without being explicitly told by an experienced user.
>>>>
>>>> As hinted at the top I'd like to propose changing Django so the default
>>>> code behaves like the prefetch_related code.
>>>> Longer term I think this should be the default behaviour but obviously
>>>> it needs to be proved first so for now I'd suggest a new queryset function
>>>> that enables this behaviour.
>>>>
>>>> I have a proof of concept of this mechanism that I've used successfully
>>>> in production. I'm not posting it yet because I'd like to focus on desired
>>>> behavior rather than implementation details. But in summary, what it does
>>>> is when accessing a missing field on a model, rather than fetching it just
>>>> for that instance, it runs a prefetch_related query to fetch it for all
>>>> peer instances that were fetched in the same queryset. So in the example
>>>> above it prefetches all Questions in one query.
>>>>
>>>> This might seem like a risky thing to do but I'd argue that it really
>>>> isn't.
>>>> The only time this isn't superior to the default case is when you are
>>>> post filtering the queryset results in Python.
>>>> Even in that case it's only inferior if you started with a large number
>>>> of results, filtered basically all of them and the code is structured so
>>>> that the filtered ones aren't garbage collected.
>>>> To cover this rare case the automatic prefetching can easily be
>>>> disabled on a per queryset or per object basis. Leaving us with a rare
>>>> downside that can easily be manually resolved in exchange for a significant
>>>> general improvement.
>>>>
>>>> In practice this thing is almost magical to work with. Unless you
>>>> already have extensive and tightly maintained prefetches everywhere you get
>>>> an immediate boost to virtually everything that touches the database, often
>>>> knocking orders of magnitude off page load times.
>>>>
>>>> If an agreement can be reached on pursuing this then I'm happy to put
>>>> in the work to productize the proof of concept.
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Django developers (Contributions to Django itself)" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to django-develop...@googlegroups.com.
>>>> To post to this group, send email to django-d...@googlegroups.com.
>>>> Visit this group at https://groups.google.com/group/django-developers.
>>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>>> gid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%
>>>> 40googlegroups.com
>>>> <https://groups.google.com/d/msgid/django-developers/d402bf30-a5af-4072-8b50-85e921f7f9af%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Django developers (Contributions to Django itself)" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to django-develop...@googlegroups.com.
>>> To post to this group, send email to django-d...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/django-developers.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmh
>>> Qv6bJFx3dty3w%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/django-developers/CAMwjO1Gaha-K7KkefJkiS3LRdXvaPPwBeuKmhQv6bJFx3dty3w%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Django developers (Contributions to Django itself)" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to django-develop...@googlegroups.com.
>>> To post to this group, send email to django-d...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/django-developers.
>>> To view this discussion on the web visit https://groups.google.com/d/ms
>>> gid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net
>>> <https://groups.google.com/d/msgid/django-developers/a5780df6-ce60-05ae-88e3-997e6bc88f5c%40cantab.net?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/django-developers/2f0b5932-1a38-4eaf-84aa-
> 13960a303141%40googlegroups.com
> <https://groups.google.com/d/msgid/django-developers/2f0b5932-1a38-4eaf-84aa-13960a303141%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-developers+unsubscr...@googlegroups.com.
> To post to this group, send email to django-developers@googlegroups.com.
> Visit this group at https://groups.google.com/group/django-developers.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/django-developers/9d359e4d-59d6-ff81-8ca6-db3bcd36ac0c%40cantab.net
> <https://groups.google.com/d/msgid/django-developers/9d359e4d-59d6-ff81-8ca6-db3bcd36ac0c%40cantab.net?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-developers+unsubscr...@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/CAMwjO1GEzjGFXi-mP_cL-b_QFYLOZ3bVmdhy%2BaA1OzicQHx3iA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
            • ... Josh Smeaton
              • ... Adam Johnson
              • ... Adam Johnson
      • ... Gordon Wrigley
    • R... Luke Plant
      • ... Sean Brant
      • ... Anthony King
        • ... Josh Smeaton
          • ... Curtis Maloney
          • ... Luke Plant
            • ... Marc Tamlyn
              • ... Tom Forbes
              • ... Gordon Wrigley
            • ... Patryk Zawadzki
  • Re: Au... Cristiano Coelho
    • R... Alexander Hill
      • ... Josh Smeaton
        • ... Adam Johnson
          • ... 'Tom Evans' via Django developers (Contributions to Django itself)
            • ... Shai Berger
              • ... Josh Smeaton

Reply via email to