Re: [Django] #23557: annotate gives different results on postgresql and mysql

Django Sun, 28 Sep 2014 23:25:54 -0700

#23557: annotate gives different results on postgresql and mysql
-------------------------------------+-------------------------------------
     Reporter:  brian                |                    Owner:  nobody
         Type:  Bug                  |                   Status:  new
    Component:  Database layer       |                  Version:  1.7
  (models, ORM)                      |               Resolution:
     Severity:  Normal               |             Triage Stage:  Accepted
     Keywords:                       |      Needs documentation:  0
    Has patch:  0                    |  Patch needs improvement:  0
  Needs tests:  0                    |                    UI/UX:  0
Easy pickings:  0                    |
-------------------------------------+-------------------------------------

Comment (by akaariai):

I am with brian here - I think Django's way to calculate the GROUP BY is
complex enough without the GROUP BY clause being altered by .order_by()
calls.

In my opinion Django's group by logic should be as follows:
- Group by the primary key of the table in the query by default. The
group by will need to be appended by functionally dependent columns as
many databases need this. This means that we can add any column from the
same table to the group by, or any column from any table pointed by direct
foreign key or reverse one to one field.
- Group by the user's defined .values() call. Possibly allow extending
it by functionally dependent columns. That is, if some table's primary key
is in the values() list, then allow extension from that primary key along
foreign keys and o2ofields.
- If any other column is needed in the group by, then error out. This
means that:
`CPUJob.objects.values('project').annotate(usage=Sum('cpu_usage'),
jobs=Count('id')).order_by('date')` is an error.

The functionally dependent columns appending means that we need to do a
group by author.id, author.name on some databases even though technically
group by author.id gives the same result. Similarly, we need to group by
author.id, author.name, book.id, book.name when
.select_related('favorite_book') is applied, where favorite_book is a
foreign key from author to book.

The main point here is that we really shouldn't alter the results because
.order_by() was added to the query. Just error out in that case. Resist
the temptation to guess, explicit is better than implicit and so on...

I think we can deprecate the current behavior of silently altering the
group by. Checking if a column is functionally dependent on some subset of
columns in the group by isn't exactly easy, but should be doable at least
for primary key case. Multi-column unique indexes are going to be a bit
harder to support...

--
Ticket URL: <https://code.djangoproject.com/ticket/23557#comment:7>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.

--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to django-updates+unsubscr...@googlegroups.com.
To post to this group, send email to django-updates@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/063.f4db9e06a4283793285a5e869bb09e75%40djangoproject.com.
For more options, visit https://groups.google.com/d/optout.

Re: [Django] #23557: annotate gives different results on postgresql and mysql

Reply via email to