#34262: Queryset grouped by annotation with aggregates on another annotated
expression crashes on MySQL with sql_mode=only_full_group_by.
-------------------------------------+-------------------------------------
     Reporter:  Mariusz Felisiak     |                    Owner:  nobody
         Type:  Bug                  |                   Status:  new
    Component:  Database layer       |                  Version:  4.1
  (models, ORM)                      |
     Severity:  Normal               |               Resolution:
     Keywords:  mysql                |             Triage Stage:  Accepted
  only_full_group_by                 |
    Has patch:  0                    |      Needs documentation:  0
  Needs tests:  0                    |  Patch needs improvement:  0
Easy pickings:  0                    |                    UI/UX:  0
-------------------------------------+-------------------------------------
Changes (by David Wobrock):

 * cc: David Wobrock (added)


Comment:

 Hey there,

 Took a look at what is happening and why MySQL is failing with
 `ONLY_FULL_GROUP_BY`.

 In short and simplified, this statement works:
 {{{
 mysql> SELECT GREATEST(pages, 600), MIN(pages) FROM aggregation_book GROUP
 BY GREATEST(pages, 600) ORDER BY NULL;
 +----------------------+------------+
 | GREATEST(pages, 600) | MIN(pages) |
 +----------------------+------------+
 |                  600 |        300 |
 |                 1132 |       1132 |
 |                  946 |        946 |
 +----------------------+------------+
 3 rows in set (0,01 sec)
 }}}

 And when you try to add a third expression, that uses the two first:
 {{{
 mysql> SELECT GREATEST(pages, 600), MIN(pages), LEAST(MIN(pages),
 GREATEST(pages, 600)) AS least FROM aggregation_book GROUP BY
 GREATEST(pages, 600) ORDER BY NULL;
 ERROR 1055 (42000): Expression #3 of SELECT list is not in GROUP BY clause
 and contains nonaggregated column
 'test_django_tests.aggregation_book2.pages' which is not functionally
 dependent on columns in GROUP BY clause; this is incompatible with
 sql_mode=only_full_group_by
 }}}
 MySQL is not happy, even though it seems rather straightforward that this
 query should work.

 The resources I could find on the topic are here:
 - [https://bugs.mysql.com/bug.php?id=90792 A ticket from the MySQL bug
 tracker]
 - [https://stackoverflow.com/q/49447622 A related StackOverflow thread]
 - [https://dev.mysql.com/blog-archive/when-only_full_group_by-wont-see-
 the-query-is-deterministic/ A MySQL blog post]

 And the blog post explains in more depth why it's not working.

 -------

 That leaves us with a choice to make for Django's behavior I reckon :)
 Some options:

 == 1. Add an `ANY_VALUE` around the problematic expression

 That solves the issue here for instance:
 {{{
 mysql> SELECT GREATEST(pages, 600), MIN(pages),
 ANY_VALUE(LEAST(MIN(pages), GREATEST(pages, 600))) AS least FROM
 aggregation_book2 GROUP BY GREATEST(pages, 600) ORDER BY NULL;
 +----------------------+------------+-------+
 | GREATEST(pages, 600) | MIN(pages) | least |
 +----------------------+------------+-------+
 |                  600 |        300 |   300 |
 |                 1132 |       1132 |  1132 |
 |                  946 |        946 |   946 |
 +----------------------+------------+-------+
 3 rows in set (0,00 sec)
 }}}
 However, I fear that detecting when to wrap the expression with an
 `ANY_VALUE` is a rabbit hole we don't want to go down, as we might end up
 implementing what the MySQL team didn't want to implement.


 == 2. Raise awareness

 We could, firstly, document the potential issue, and secondly raise a
 warning when such an error occurs when executing a query a Django.
 That way, users are at least aware that's not entirely their or Django's
 fault.


 == 3. Generally change query generation

 Another type of workaround suggested by the [https://dev.mysql.com/blog-
 archive/when-only_full_group_by-wont-see-the-query-is-deterministic/ MySQL
 blog post] is to use a subquery/derived table:
 {{{
 mysql> SELECT greatest_pages,
            MIN(pages),
            LEAST(MIN(pages), greatest_pages) AS least
     FROM (SELECT GREATEST(pages, 600) greatest_pages,
                  pages
           FROM aggregation_book2) AS t
     GROUP BY greatest_pages
     ORDER BY NULL;
 +----------------+------------+-------+
 | greatest_pages | MIN(pages) | least |
 +----------------+------------+-------+
 |            600 |        300 |   300 |
 |           1132 |       1132 |  1132 |
 |            946 |        946 |   946 |
 +----------------+------------+-------+
 3 rows in set (0,00 sec)
 }}}
 So that we always try to group on a column, and not an expression.
 Even though, it might be worse in terms of performances, depending the DB
 implementation I guess.

 This change would then affect all databases I reckon, which is a much
 larger change, and therefore riskier.


 == 4. Any other option! :D



 I hope all of this makes sense, happy to read any thoughts on this :)
 See ya!

-- 
Ticket URL: <https://code.djangoproject.com/ticket/34262#comment:3>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/01070185e5c7f547-fb2a25f0-2073-465a-9c97-0ed1c3664e46-000000%40eu-central-1.amazonses.com.

Reply via email to