Hello fellow developers,

As some of you may know PostgreSQL 9.1 added support for GROUP'ing BY
selected table primary keys[0] only. Five years ago it was reported[1] that
Django could rely on this feature to speed up aggregation on models backed
up by tables with either many fields or a few large ones.

Being affected by this slow down myself I decided to dive into the ORM 
internals
and managed to get a patch that made it in 1.9[2] thanks to Anssi's and 
Josh's
review[3].

One subtle thing I didn't know back in the time is that PostgreSQL query 
planner
isn't able to introspect database views columns' functional dependency like 
it
does with tables and thus prevents the primary key GROUP'ing optimization 
from
being used.

While Django doesn't support database views officially it documents that
unmanaged models can be used to query them[4] and thereby perform 
aggregation on
them and generating an invalid query.

This was initially reported as a crashing bug 9 months ago[5] and the 
consensus
at this time was that it was an esoteric edge case since there was few 
reports
of breakages and it went off my radar. Fast-forward to a month ago, this is
reported again[6] and it takes the reporter quite a lot of effort to 
determine
the origin of the issue, pushing me to come up with a solution as I 
introduced
this behavior.

Before Claude makes me realize this is a duplicate of the former report 
(which I
completely forgot about in the mean time) I implement a patch and commit it 
once
it's reviewed [7].

When I closed the initial ticket as "fixed" the reporter brought to my 
attention
that this was now introducing a performance regression for unmanaged models
relying on aggregation and that we should document how to disable this
optimization by creating a backend subclass as a workaround instead.

In my opinion the current situation is as follow. The optimization 
introduced a
break in backward compatibility in 1.9 as we've always documented that 
database
views could be queried against using unmanaged models. If this issue had 
been
discovered during the 1.9 release cycle it would have been eligible for a
backport because it was a bug in a newly introduced feature. Turning this
optimization off for unmanaged models by assuming they could be views is 
only
going to degrade performance of queries using unmanaged models to perform
aggregation on tables with either a large number of columns or large columns
using PostgreSQL.

Therefore I'd favor we keep the current adjustment in the master branch as 
it
restores backward compatibility but I don't have strong feelings about 
reverting
it either if it's deemed inappropriate.

Another solution I came up while writing this post would be to replace the
feature flag by a callable that takes a model as a single parameter and 
returns
whether or not the optimization can be performed against it. The default
implementation would return `mode._meta.managed` but it would make it 
easier for
users affected by this to override in order to opt-in or out based on their
application logic.

Thank you for your time,
Simon

[0] https://www.postgresql.org/docs/9.1/static/sql-select.html#SQL-GROUPBY
[1] https://code.djangoproject.com/ticket/19259
[2] 
https://github.com/django/django/commit/dc27f3ee0c3eb9bb17d6cb764788eeaf73a371d7
[3] https://github.com/django/django/pull/4397
[4] https://docs.djangoproject.com/en/1.11/ref/models/options/#managed
[5] https://code.djangoproject.com/ticket/27241
[6] https://code.djangoproject.com/ticket/28107
[7] 
https://github.com/django/django/commit/daf2bd3efe53cbfc1c9fd00222b8315708023792

-- 
You received this message because you are subscribed to the Google Groups 
"Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/django-developers.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-developers/cf767186-8082-4553-ba85-2547169d5b53%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
  • Post... charettes
    • ... Josh Smeaton
    • ... jroes
    • ... Dylan Young
    • ... Matthew Pava
    • ... vojtech.bocek via Django developers (Contributions to Django itself)

Reply via email to