Re: Aggregate Support to the ORM

Nicolas Lara Fri, 02 May 2008 10:13:08 -0700

I am not sure I agree with you on this. Since the succesive calls to
aggregate/annotate would actually be constructing the elements for the
final query in non ordered datastructures I think the order in which
different elements are inserted to the aggregation should not matter.
The values modifier on the other hand might not be commutative with
the annotation.
qs.values().annotate() != qs.annotate().values()
where the first one would group by the elements in values while the
second would use the default grouping and just have values behave as
it normally does. In other words, the annotate modifier, when applied
to a ValuesQuerySet manages the grouping; but a values modifier, when
applied to a queryset that contains annotation, behaves normally.
Having annotate not be commutative seems confusing.
I can see your point on th difference between annotate(A1, A2) and
annotate(A1).annotate(A2) but  what is the diference between
qs.annotate(A1).annotate(A2) and qs.annotate(A2).annotate(A1)??


I think the 2 options now to tackle this problem are:
   (1) annotate(A1, A2) would behave as SQL (when A1 is on the local
model and A2 is on a join)
         and annotate(A1).annotate(A2) would result in doing the 2
queries and returning something as the second example of my previous
post
   (2) Simply restrict this case. If somebody needs such a case they
can just do 2 querysets or fall back to sql.

I am more inclined towards (2) since (1) seems hard to grasp for the
users, error prone and would add innecesary complexity to the orm's
code that would need to be maintained in the future. Also the use case
is unusual enough so it can fall in the 20% roll your own sql side.

On Thu, May 1, 2008 at 4:53 PM, Yuri Baburov <[EMAIL PROTECTED]> wrote:
>
>  can this be considered as difference of complex apply against 2
>  consequent applications of annotate?
>  like 2 filter applied one-by-one can be different from one complex filter?
>  like: .annotate('purchases__quantity__sum').annotate('age__max')
>  and .annotate('purchases__quantity__sum', 'age__max')?
>  compare to:
>  .filter(purchases__quantity__gt=30).filter(age__gt=30) and
>  .filter(purchases__quantity__gt=30, age__gt=30).
>  it seems that annotate calls do neither commute or associate cause of
>  sql grouping nature, exactly like filter ones.
>  annotate(A1, A2) != annotate(A1).annotate(A2) != annotate(A2).annotate(A1)
>  filter(A1, A2) != filter(A1).filter(A2) != filter(A2).filter(A1)
>  here "!=" is "not equivalent to".
>
>  btw, that means you showed only 2 of 3 possible sqls... am i right?
>
>
>
>  On Thu, May 1, 2008 at 9:07 PM, Nicolas E. Lara G.
>  <[EMAIL PROTECTED]> wrote:
>  >
>  >  Hello,
>  >
>  >  I've been looking into the different use cases for aggregation and
>  >  came across one that I wasn't very sure what should be the expected
>  >  behaviour (regardless of the syntax used to express it)
>  >
>  >  If we do something like:
>  >   Buyer.objects.all().annotate('purchases__quantity__sum', 'age__max')
>  >  or
>  >   Buyer.objects.all().annotate( Sum('purchases__quantity'),
>  >  Max('age) )
>  >
>  >  There are a few possibly expected results:
>  >  (1) [
>  >         {'purchases__quantity__sum': 777L,
>  >          'age': 35,
>  >          'age__max': 35,
>  >
>  >          'id': 2,
>  >           'name': u'John'},
>  >         {'purchases__quantity__sum': 787L,
>  >           'age': 24,
>  >            'age__max': 24,
>  >
>  >            'id': 1,
>  >            'name': u'Peter'}
>  >       ]
>  >
>  >       In this case we  are returning the result of a query like this:
>  >
>  >      SELECT "t_buyer"."id", "t_buyer"."name", "t_buyer"."age",
>  >  SUM("t_purchase"."quantity"), MAX("t_buyer"."age") FROM "t_buyer"
>  >  INNER JOIN "t_buyer_purchases" ON ("t_buyer"."id" =
>  >  "t_buyer_purchases"."buyer_id") INNER JOIN "t_purchase" ON
>  >  ("t_buyer_purchases"."purchase_id" = "t_purchase"."id") GROUP BY
>  >  "t_buyer"."id", "t_buyer"."name", "t_buyer"."age"
>  >
>  >      And the aggregation on the whole model does not happen because of
>  >  the grouping ("select max(i) from x group by i" == "select i from x")
>  >
>  >  (2)  [
>  >         {'purchases__quantity__sum': 777L,
>  >          'age': 35,
>  >          'age__max': 35,
>  >
>  >          'id': 2,
>  >           'name': u'John'},
>  >         {'purchases__quantity__sum': 787L,
>  >           'age': 24,
>  >           'age__max': 35,
>  >
>  >           'id': 1,
>  >           'name': u'Peter'}
>  >       ]
>  >
>  >       In this case we are seeing the result of two queries combined:
>  >
>  >       The previous one for the annotation and this:
>  >
>  >        SELECT MAX("t_buyer"."age") FROM "t_buyer"
>  >
>  >         for aggregating on the whole model.
>  >
>  >  With (1) we can get unexpected results. Imagine we were not using max
>  >  but sum instead, the buyer's age would be sumed as many times as he
>  >  has made a purchase.
>  >  With (2) we would have to make 2 queries while the user expects only
>  >  one two happen. Also for users that are used to sql, for very wicked
>  >  reasons, some user might be interested in executing a query that
>  >  actually sums for every relation.
>  >
>  >  The strange query requirement is a very weak reason for this behavior
>  >  since for this kind of things you can always fall to sql. The number
>  >  of queries, on the other hand, is something that can be problematic.
>  >  Should we state that the number of queries of a aggregate should be
>  >  only one or is it ok, for user comodity, to have querysets that
>  >  perform more than one sql query?
>  >
>  >  Other possible solution is to simply restrict that the aggregation can
>  >  only be done on either one table only or one pair of tables at a time.
>  >  We could also just stay with (1) and it would be the user's
>  >  responsability to make the adequate queries.
>  >
>  >  What do you think?
>  >
>  >
>  >  --
>  >
>  >
>  > Nicolas Lara
>  >  >
>  >
>
>
>
>  --
>  Best regards, Yuri V. Baburov, ICQ# 99934676, Skype: yuri.baburov,
>  MSN: [EMAIL PROTECTED]
>
>
>
>  >
>



-- 
Nicolas Lara
Linux user #380134
Public key id: 0x152e7713 at http://keyserver.noreply.org/
# Anti-Spam e-mail addresses:
python -c "print '@'.join(['nicolaslara', '.'.join([x for x in
reversed( 'com|gmail'.split('|') )])])"
python -c "print '@'.join(['nicolas', '.'.join([x for x in reversed(
've|usb|labf|ac'.split('|') )])])"

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django developers" group.
To post to this group, send email to django-developers@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-developers?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: Aggregate Support to the ORM

Reply via email to