#24218: Use sub-query in ORM when distinct and order_by columns do not match
-------------------------------------+-------------------------------------
     Reporter:  miki725              |                    Owner:  nobody
         Type:  New feature          |                   Status:  new
    Component:  Database layer       |                  Version:  master
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:  subquery distinct    |             Triage Stage:
  order_by                           |  Unreviewed
    Has patch:  1                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Changes (by miki725):

 * needs_better_patch:   => 0
 * needs_tests:   => 0
 * needs_docs:   => 0


Old description:

> This ticket is to propose a slight change in ORM - use subqueries when
> querying a model where {{{.distinct()}}} and {{{.order_by()}}} (or
> {{{.extra(order_by=())}}}) leftmost columns do not match. For example:
>
> {{{#!python
> Model.objects.all().distinct('foo').order_by('bar')
> }}}
>
> The above generates the following SQL:
>
> {{{#!sql
> SELECT DISTINCT ON ("app_model"."foo") <lots of columns here>
> FROM "app_model"
> ORDER BY "app_model"."bar" ASC;
> }}}
>
> I am not sure about all backends however the above syntax is not allowed
> in PostgreSQL which produces the following error:
>
> {{{#!sql
> ERROR:  SELECT DISTINCT ON expressions must match initial ORDER BY
> expressions
> }}}
>
> Here are [http://www.postgresql.org/docs/9.0/static/sql-select.html#SQL-
> DISTINCT PostgreSQL docs] explaining why that is not allowed:
>
>   DISTINCT ON ( expression [, ...] ) keeps only the first row of each set
> of rows where the given expressions evaluate to equal. [...] Note that
> the "first row" of each set is unpredictable unless ORDER BY is used to
> ensure that the desired row appears first. [...] The DISTINCT ON
> expression(s) must match the leftmost ORDER BY expression(s).
>
> This ticket proposes to use subqueries in such situations which would use
> SQL:
>
> {{{#!sql
> SELECT *
> FROM (
>   SELECT DISTINCT ON ("app_model"."foo") <lots of columns here>
>   FROM "app_model"
> ) result
> ORDER BY "app_model"."bar" ASC;
> }}}
>
> The above is perfectly valid SQL and produces expected results (please
> note that {{{ORDER_BY}}} is in the outer query to guarantee that distinct
> results are correctly sorted).
>
> I created a simple [https://gist.github.com/miki725/ce26d8b6ee2f3075884a
> patch] by overwriting few things in {{{SQLCompiler.as_sql()}}} which
> seems to work pretty well. The patch only creates subquery when the above
> dilema is encountered which should not have any negative side-effects on
> existing queries (since such queries were not allowed by SQL). The patch
> also works on the {{{.count()}}} queries since Django then strips any
> ordering hence the subquery is never created.

New description:

 This ticket is to propose a slight change in ORM - use subqueries when
 querying a model where {{{.distinct()}}} and {{{.order_by()}}} (or
 {{{.extra(order_by=())}}}) leftmost columns do not match. For example:

 {{{#!python
 Model.objects.all().distinct('foo').order_by('bar')
 }}}

 The above generates the following SQL:

 {{{#!sql
 SELECT DISTINCT ON ("app_model"."foo") <lots of columns here>
 FROM "app_model"
 ORDER BY "app_model"."bar" ASC;
 }}}

 I am not sure about all backends however the above syntax is not allowed
 in PostgreSQL which produces the following error:

 {{{#!sql
 ERROR:  SELECT DISTINCT ON expressions must match initial ORDER BY
 expressions
 }}}

 Here are [http://www.postgresql.org/docs/9.0/static/sql-select.html#SQL-
 DISTINCT PostgreSQL docs] explaining why that is not allowed:

   DISTINCT ON ( expression [, ...] ) keeps only the first row of each set
 of rows where the given expressions evaluate to equal. [...] Note that the
 "first row" of each set is unpredictable unless ORDER BY is used to ensure
 that the desired row appears first. [...] The DISTINCT ON expression(s)
 must match the leftmost ORDER BY expression(s).

 This ticket proposes to use subqueries in such situations which would use
 SQL:

 {{{#!sql
 SELECT *
 FROM (
   SELECT DISTINCT ON ("app_model"."foo") <lots of columns here>
   FROM "app_model"
 ) result
 ORDER BY "app_model"."bar" ASC;
 }}}

 The above is perfectly valid SQL and produces expected results (please
 note that {{{ORDER_BY}}} is in the outer query to guarantee that distinct
 results are correctly sorted).

 I created a simple proof-of-concept
 [https://gist.github.com/miki725/ce26d8b6ee2f3075884a patch] by
 overwriting few things in {{{SQLCompiler.as_sql()}}} which seems to work
 pretty well. The patch only creates subquery when the above dilema is
 encountered which should not have any negative side-effects on existing
 queries (since such queries were not allowed by SQL). The patch also works
 on the {{{.count()}}} queries since Django then strips any ordering hence
 the subquery is never created.

--

--
Ticket URL: <https://code.djangoproject.com/ticket/24218#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/065.fc4f8b0f68dcbde16ecfd9ed0ae315a3%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to