#23557: annotate gives different results on postgresql and mysql
-------------------------------------+-------------------------------------
     Reporter:  brian                |                    Owner:  nobody
         Type:  Bug                  |                   Status:  new
    Component:  Database layer       |                  Version:  1.7
  (models, ORM)                      |               Resolution:
     Severity:  Normal               |             Triage Stage:  Accepted
     Keywords:                       |      Needs documentation:  0
    Has patch:  0                    |  Patch needs improvement:  0
  Needs tests:  0                    |                    UI/UX:  0
Easy pickings:  0                    |
-------------------------------------+-------------------------------------

Comment (by akaariai):

 I am with brian here - I think Django's way to calculate the GROUP BY is
 complex enough without the GROUP BY clause being altered by .order_by()
 calls.

 In my opinion Django's group by logic should be as follows:
   - Group by the primary key of the table in the query by default. The
 group by will need to be appended by functionally dependent columns as
 many databases need this. This means that we can add any column from the
 same table to the group by, or any column from any table pointed by direct
 foreign key or reverse one to one field.
   - Group by the user's defined .values() call. Possibly allow extending
 it by functionally dependent columns. That is, if some table's primary key
 is in the values() list, then allow extension from that primary key along
 foreign keys and o2ofields.
   - If any other column is needed in the group by, then error out. This
 means that:
 `CPUJob.objects.values('project').annotate(usage=Sum('cpu_usage'),
 jobs=Count('id')).order_by('date')` is an error.

 The functionally dependent columns appending means that we need to do a
 group by author.id, author.name on some databases even though technically
 group by author.id gives the same result. Similarly, we need to group by
 author.id, author.name, book.id, book.name when
 .select_related('favorite_book') is applied, where favorite_book is a
 foreign key from author to book.

 The main point here is that we really shouldn't alter the results because
 .order_by() was added to the query. Just error out in that case. Resist
 the temptation to guess, explicit is better than implicit and so on...

 I think we can deprecate the current behavior of silently altering the
 group by. Checking if a column is functionally dependent on some subset of
 columns in the group by isn't exactly easy, but should be doable at least
 for primary key case. Multi-column unique indexes are going to be a bit
 harder to support...

--
Ticket URL: <https://code.djangoproject.com/ticket/23557#comment:7>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

-- 
You received this message because you are subscribed to the Google Groups 
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/django-updates/063.f4db9e06a4283793285a5e869bb09e75%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to