On 4/13/07, Honza Král <[EMAIL PROTECTED]> wrote: > the queryset refactoring must (well, should) happen before aggregation > support. After the refactoring, it should be relatively easy to put in > the aggregations. > > see: > http://code.djangoproject.com/ticket/3566
Ok; I've had a chance to look at this now. Although there are some syntax differences, I think your idea and mine [1] are acutally pretty close. A bit of both, and we might just have something. Comments: GROUP BY = values() ~~~~~~~~~~~~~~~~~~ Isn't the group_by clause duplicating what is already provided by the values() filter? i.e., saying 'return me a list of dictionaries of object data that isn't a full object'? When the aggregates clause is in use, a GROUP BY over the selected fields is implied. If you don't provide a values clause, then you want to GROUP BY all the fields in the model So, your example would become: >>> queryset = Model.objects.all() >>> queryset.values('name', 'city').aggregates(sum=('pay', 'some_other_field'), avg=('pay', 'age'), count=True) Output format ~~~~~~~~~~~~~ As my proposal established, I'm a fan of the 'augmenting the returned data' approach: [ { 'name' : 'John Doe', 'city' : 'Prague', 'sum' :' { 'pay' : 50000, 'some_other_field' : 10000 } 'average' : { 'pay' : 10000, 'age' : 30 }, 'count' : 5, }, ... ] To me, the clause is returning a list of data groups (i.e., the results of the values() clause); putting the grouping data in a group_by dictionary seems overkill. This approach does introduce some new magic-words that are not allowed as field names (or are at least dangerous if you do). Personally, I don't see that as a major problem. Weaknesses in the function=tuple argument approach ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ While the function=tuple of field names approach does work, I feel it is lacking: - You can't use aliases: SUM(field) as "total_pay" - Filtering on aggregated values is difficult (impossible?) - It doesn't handle joins in the same way as filter() The approach I proposed was to mirror the filter syntax: pay__sum='total_pay' other_field__average='some_alias' This approach mirrors the filter mechanism for attribute naming and joins, allows for aliases, which provides a mechanism to filter on aliases. The only downside I can see is that the simple case requires the user to provide an alias - i.e., pay__sum='pay_sum'. I'll admit that this isn't particularly elegant. However, it should be possible to combine both approaches: >>> Model.objects.aggregate(sum=('pay'), other_field__average='mean_value') [ { 'name' : 'John Doe', 'city' : 'Prague', 'sum' :' { 'pay' : 50000, } 'mean_value' :30, }, ... ] This means the sum of pay isn't aliased, so it can't be filtered, but you can filter on 'mean_value'. Argument clashes aren't a problem either - 'sum' by itself is unambiguous as an argument; any filter-like argument will need to have at least a '__' in it. Ambiguities in COUNT ~~~~~~~~~~~~~~~~~~~~ aggregate(count=True) is already covered with count(); I'm not sure I'm wild about the duplication. There is also the problem that it precludes counting other things: e.g., counting related objects - return the number of books that each author has written. Author.objects.aggregate(books_count='number_of_books_written') or Author.objects.aggregate(count=('books')) On top of this, you can get the gross count from: len(Author.objects.aggregate(...)) Aggregate of results vs aggregates of related objects ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you run an aggregate over a field from the table you are querying, the aggregate is a table level property, not a row level property. Compare the expected results from: - Get me the date of the earliest and latest published article: Return a single value for the min, and a single value for the max. - Annotate each author with the date of their earliest and latest published article: Return a min and max value for each author. The second case is relatively easy: >>> Author.objects.aggregate(article__pub_date__min='earliest', article__pub_date__max='latest')) [{ 'name':'John Doe', 'earliest': 2007-01-04 'latest': 2007-02-14 } ... ] However, the first case: >>> Article.objects.aggregate(pub_date__min='earliest', pub_date__max='latest')) {[ 'title': 'first article' 'earliest': 2007-01-04 'latest': 2007-02-14 ], [ 'title': 'second article' 'earliest': 2007-01-04 'latest': 2007-02-14 ], } while legal, isn't the most elegant way at getting at a unique min/max for an entire table. This was the reason for my suggestion for two functions: aggregate and annotate. Aggregate returns a dictionary of just aggregated values: >>> Article.objects.aggregate(min=('pub_date'), max=('pub_date')) { 'min' : '2007-01-04', 'max' : '2007-02-14' } Whereas annotate() returns a list of annotated dictionary data. ================================ I'd better stop now - this is getting longer than I originally anticipated, and if it gets much longer nobody will read it :-) As always, feedback welcome. Yours, Russ Magee %-) [1] http://groups.google.com/group/django-developers/msg/2f91fa217bc465bb --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to [EMAIL PROTECTED] To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers?hl=en -~----------~----~----~----~------~----~------~--~---