#24218: Use sub-query in ORM when distinct and order_by columns do not match
-------------------------------------+-------------------------------------
Reporter: miki725 | Owner: nobody
Type: New feature | Status: new
Component: Database layer | Version: master
(models, ORM) |
Severity: Normal | Resolution:
Keywords: subquery distinct | Triage Stage:
order_by | Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Changes (by miki725):
* needs_better_patch: => 0
* needs_tests: => 0
* needs_docs: => 0
Old description:
> This ticket is to propose a slight change in ORM - use subqueries when
> querying a model where {{{.distinct()}}} and {{{.order_by()}}} (or
> {{{.extra(order_by=())}}}) leftmost columns do not match. For example:
>
> {{{#!python
> Model.objects.all().distinct('foo').order_by('bar')
> }}}
>
> The above generates the following SQL:
>
> {{{#!sql
> SELECT DISTINCT ON ("app_model"."foo") <lots of columns here>
> FROM "app_model"
> ORDER BY "app_model"."bar" ASC;
> }}}
>
> I am not sure about all backends however the above syntax is not allowed
> in PostgreSQL which produces the following error:
>
> {{{#!sql
> ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY
> expressions
> }}}
>
> Here are [http://www.postgresql.org/docs/9.0/static/sql-select.html#SQL-
> DISTINCT PostgreSQL docs] explaining why that is not allowed:
>
> DISTINCT ON ( expression [, ...] ) keeps only the first row of each set
> of rows where the given expressions evaluate to equal. [...] Note that
> the "first row" of each set is unpredictable unless ORDER BY is used to
> ensure that the desired row appears first. [...] The DISTINCT ON
> expression(s) must match the leftmost ORDER BY expression(s).
>
> This ticket proposes to use subqueries in such situations which would use
> SQL:
>
> {{{#!sql
> SELECT *
> FROM (
> SELECT DISTINCT ON ("app_model"."foo") <lots of columns here>
> FROM "app_model"
> ) result
> ORDER BY "app_model"."bar" ASC;
> }}}
>
> The above is perfectly valid SQL and produces expected results (please
> note that {{{ORDER_BY}}} is in the outer query to guarantee that distinct
> results are correctly sorted).
>
> I created a simple [https://gist.github.com/miki725/ce26d8b6ee2f3075884a
> patch] by overwriting few things in {{{SQLCompiler.as_sql()}}} which
> seems to work pretty well. The patch only creates subquery when the above
> dilema is encountered which should not have any negative side-effects on
> existing queries (since such queries were not allowed by SQL). The patch
> also works on the {{{.count()}}} queries since Django then strips any
> ordering hence the subquery is never created.
New description:
This ticket is to propose a slight change in ORM - use subqueries when
querying a model where {{{.distinct()}}} and {{{.order_by()}}} (or
{{{.extra(order_by=())}}}) leftmost columns do not match. For example:
{{{#!python
Model.objects.all().distinct('foo').order_by('bar')
}}}
The above generates the following SQL:
{{{#!sql
SELECT DISTINCT ON ("app_model"."foo") <lots of columns here>
FROM "app_model"
ORDER BY "app_model"."bar" ASC;
}}}
I am not sure about all backends however the above syntax is not allowed
in PostgreSQL which produces the following error:
{{{#!sql
ERROR: SELECT DISTINCT ON expressions must match initial ORDER BY
expressions
}}}
Here are [http://www.postgresql.org/docs/9.0/static/sql-select.html#SQL-
DISTINCT PostgreSQL docs] explaining why that is not allowed:
DISTINCT ON ( expression [, ...] ) keeps only the first row of each set
of rows where the given expressions evaluate to equal. [...] Note that the
"first row" of each set is unpredictable unless ORDER BY is used to ensure
that the desired row appears first. [...] The DISTINCT ON expression(s)
must match the leftmost ORDER BY expression(s).
This ticket proposes to use subqueries in such situations which would use
SQL:
{{{#!sql
SELECT *
FROM (
SELECT DISTINCT ON ("app_model"."foo") <lots of columns here>
FROM "app_model"
) result
ORDER BY "app_model"."bar" ASC;
}}}
The above is perfectly valid SQL and produces expected results (please
note that {{{ORDER_BY}}} is in the outer query to guarantee that distinct
results are correctly sorted).
I created a simple proof-of-concept
[https://gist.github.com/miki725/ce26d8b6ee2f3075884a patch] by
overwriting few things in {{{SQLCompiler.as_sql()}}} which seems to work
pretty well. The patch only creates subquery when the above dilema is
encountered which should not have any negative side-effects on existing
queries (since such queries were not allowed by SQL). The patch also works
on the {{{.count()}}} queries since Django then strips any ordering hence
the subquery is never created.
--
--
Ticket URL: <https://code.djangoproject.com/ticket/24218#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/065.fc4f8b0f68dcbde16ecfd9ed0ae315a3%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.