On 3/2/06, Rock <[EMAIL PROTECTED]> wrote: > > My other changeset developed at this week's Django Sprint is much more > extensive and is best explained by the new section that I am adding to > the DB API docs. > > Feedback is welcome.
Truth be told, a desire to improve aggregates in Django is the reason that I got involved a few months back (I have a use case that requires pretty extensive use of aggregates). I was going to wait until post magic-removal merge to worry about any aggregate proposals - worry about getting MR out the door first, then worry about other improvements. However, if you're going to start playing, I don't to miss out on any fun :-) Here are some comments on your proposals (including the count patch from the other thread): 1) I'm -1 on the get_aggregate[s] notation you are suggesting. To me, as a name, 'get_aggregates' only makes sense as a same if you know SQL, and the syntax requires end users to learn two notations for aggregates - avg() and 'AVG', with the latter being very SQL, and not very python/ORM. If you are willing to use raw SQL (like get_aggregate psuedo-requires), you can already do what you are proposing with relative ease using a select kwarg. select kwargs also allows the definition of aggregate and non-aggregate 'price+tax+tip' fields. Also, your proposal doesn't cross-multiply very well - your proposal is fine if I want 1-n summary stats of 1 field, but not if I want 1-n summary stats of 1-n fields. (i.e., sum of field 1, avg of field 2). 2) I'm also -1 on the count modification you proposed. Article.objects.all().count() reads quite obviously as the count of all articles. However, Article.objects.all().count('title') reads (to me) like an inelegant way of saying 'a count of all titles'. This isn't what your proposal would return. What you are proposing can already be acheived using Article.objects.filter(fieldname__isnull=False).count(), which, IMHO, reads better, and is more flexible. 3) I'm +0 on the min, max, avg, stddev and median (where available) suggestions as you describe. They are all reasonable analogs of the count() function, and I don't think it would necessarily hurt if they were to be included. However, I have three things preventing me from saying +1: Firstly (and trivially), I would say average() not avg(). The other abbreviations don't bother me, but for some reason avg() rubs me the wrong way. Secondly, in the absence of get_aggregates, if you want to get the minimum AND the maximum, you need to make two separate SQL queries. Obviously, this is why you suggested get_aggregates - I just think we need a better syntax. A couple of quick suggestions for an alternate syntax for multiple aggregates in a single query (these ideas are in foetal form - like I said earlier, I wasn't expecting to have this discussion for a while): >>> Article.objects.summary(count=True, min='field', average=['field', 'field'], ...) {'count' = 5, 'min'={'field':1}, 'average'={'field': 3, 'field': 4} ...} i.e., a kwarg for each aggregate function; COUNT taking true/false, and SUM/MIN/MAX/AVG/STDEV/MEDIAN taking a single field name, or a tuple/list of field names. The return value is a dictionary populated with the summary statistics. Alternatively, the return value could also be an object: >>> s = Article.objects.summary(...) >>> s.count 5 >>> s.min['field'] # (or maybe s.field.min) 1 You could also rearrange the summary method to make the field names the kwargs of the method: >>> Article.objects.summary(field1=sum, field2=(sum,average),...) if you make sum, average, etc functions that can be referenced as objects. However, this makes count harder to implement (since it doesn't need to operate on a field). These options would all allow 1-n fields with 1-n aggregates, and avoids 'SQL string' notation in ORM queries. Thirdly, (and most significantly), while avg(), min(), max() etc are reasonable representations, IMHO they only deal with a small part of the aggregate story. Your suggestions all deal with aggregates in the 'summary' sense - give me an average of a field for these objects. However, in this context, the field must belong to the object being averaged, and a query produces either a summary value or a list of objects - you don't produce both. While this is important (especially for things like count), IMHO this misses one very important use case of aggregates: those queries where you need both a list of objects AND a summary value (or, to put it another way, queries where the summary value isn't over the table whose objects we are retreiving). For example: class Machine(Model): id = CharField() class Part(Model): name = CharField() weight = FloatField() machine = ForeignKey(Machine) What is the total weight of each machine? The machine consists of several parts, and the total_weight is the sum of all weights of related parts. What you need to be able to do here is to annotate the results returned by your model with some summary stats. You can currently do this using select, tables and where kwargs on a filter; however, these are somewhat complex to set up. IMHO select/where/tables is a necessary evil; while it is required to allow spanning coverage of SQL, we should be striving to minimize the use of select/where/tables, especially when we can find an object-like subset of use cases. So, I would be aiming to be able to do something like: >>> m = Machine.objects.annotate('total_weight', part__weight=sum).all() >>> m[0].total_weight 123 This would do the join required to get weight from the part related objects, add an extra column to the results returned by SQL, and add an attribute to objects with the provided alias. You can add multiple annotations if so required, and filter (or otherwise process) an annotated query set. Unresolved issues that I would like to visit: - queries over annotated fields - summary stats that are used for queries, but not returned to the end user - interaction of annotations and queries with group_by/having at the SQL level - unifying notation for 'summary' and 'annotate' Like I said earlier, I wasn't expecting to have this conversation for a while, so my ideas are still a little underdeveloped, but I hope you get an idea of where I am heading. Russ Magee %-) --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django developers" group. To post to this group, send email to django-developers@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-developers -~----------~----~----~----~------~----~------~--~---